{"id":185,"date":"2026-01-26T13:10:56","date_gmt":"2026-01-26T05:10:56","guid":{"rendered":"https:\/\/yoyodyne.com.au\/?p=185"},"modified":"2026-01-27T08:31:00","modified_gmt":"2026-01-27T00:31:00","slug":"here-comes-the-beta-distribution","status":"publish","type":"post","link":"https:\/\/yoyodyne.com.au\/index.php\/2026\/01\/26\/here-comes-the-beta-distribution\/","title":{"rendered":"Here comes the Beta distribution!!!"},"content":{"rendered":"\n<p>Like flat-earthers, standard statistical models can sometimes be a little detached from reality. If you work in occupational hygiene, you\u2019re likely intimately familiar with the <strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Normal_distribution\" target=\"_blank\" rel=\"noreferrer noopener\">Normal<\/a> <\/strong>and <strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Log-normal_distribution\" target=\"_blank\" rel=\"noreferrer noopener\">Lognormal<\/a> <\/strong>distributions. They are the bread and butter of our industry, but they have a specific quirk that drives me nuts: they allow for &#8220;infinite tails.&#8221;<\/p>\n\n\n\n<p>Mathematically, this means that standard models suggest there is always a tiny, non-zero probability that an exposure result will reach infinity. Obviously, that\u2019s impossible. You can\u2019t have a concentration of a chemical higher than pure vapour, and you can\u2019t fit more dust into a room than the volume of the room itself. This is where the <strong><a href=\"https:\/\/en.wikipedia.org\/wiki\/Beta_distribution\" target=\"_blank\" rel=\"noreferrer noopener\">Beta Distribution<\/a><\/strong> steps in to save the day, and it\u2019s why I\u2019ve become such a vocal proponent of its use in our field (#AIOHStatsGang).<\/p>\n\n\n\n<p>The strength of the Beta distribution is that it has hard limits. It allows you to define a minimum and a maximum. By setting a ceiling\u2014an upper bound\u2014we stop the model from predicting unrealistically high results. We are effectively telling the math, &#8220;Look, physically, the exposure cannot go higher than X.&#8221; This seemingly small change completely removes those phantom probabilities of impossible exposure levels, giving us a statistical picture that actually obeys the laws of physics.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"714\" src=\"https:\/\/yoyodyne.com.au\/wp-content\/uploads\/2026\/01\/Beta_LogNormal_fit-1-1024x714.png\" alt=\"\" class=\"wp-image-189\" srcset=\"https:\/\/yoyodyne.com.au\/wp-content\/uploads\/2026\/01\/Beta_LogNormal_fit-1-1024x714.png 1024w, https:\/\/yoyodyne.com.au\/wp-content\/uploads\/2026\/01\/Beta_LogNormal_fit-1-300x209.png 300w, https:\/\/yoyodyne.com.au\/wp-content\/uploads\/2026\/01\/Beta_LogNormal_fit-1-768x536.png 768w, https:\/\/yoyodyne.com.au\/wp-content\/uploads\/2026\/01\/Beta_LogNormal_fit-1.png 1042w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>In the chart above (generated using R-base), you can see that while both functions fit the data well, the Beta distribution respects a hard ceiling at 5, whereas the Lognormal tail simply drifts off into the impossible.<\/p>\n\n\n\n<p>But here&#8217;s the rub\u2014switching to Beta isn&#8217;t for the faint of heart. The Lognormal distribution is easy; you can do it on a napkin with a $5 calculator. The Beta distribution is mathematically more complex and abstract. It\u2019s harder to explain to a layman, and the formulas are heftier. But in my opinion, I\u2019d rather struggle with the math for 10 minutes than present a risk assessment that implies a worker could be exposed to a billion ppm.<\/p>\n\n\n\n<p>This approach also forces us to be better specialists because it brings <strong>expert judgement<\/strong> back into the driver&#8217;s seat &#8211; or at least backseat driving. Because the model relies on fixed boundaries, you can&#8217;t just feed it data and walk away. You have to decide where to put those lower and upper bounds based on the specific environment. Today, there are PLENTY of AI tools to help you do this, even in Excel. It forces a deeper engagement with the process conditions rather than blind reliance on a pre-cooked spreadsheet (looking at you IHStat). It might be a headache to set up initially, but a model that respects physical reality is a model worth using.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong>Further Reading<\/strong><\/p>\n\n\n\n<p>For a deeper dive into the mathematics of the Beta distribution, check out:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/stats.libretexts.org\/Bookshelves\/Probability_Theory\/Probability_Mathematical_Statistics_and_Stochastic_Processes_(Siegrist)\/05%3A_Special_Distributions\/5.14%3A_The_Beta_Distribution\" target=\"_blank\" rel=\"noreferrer noopener\">The Beta Distribution &#8211; LibreTexts<\/a><\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:0.8rem\"><code># This work is marked with CC0 1.0. \n# To view a copy of this license, visit:\n# https:\/\/creativecommons.org\/publicdomain\/zero\/1.0\/\n\n# Author: Dean Crouch\n# Date: 27 January 2026\n\n# Load necessary libraries\nif (!require(MASS)) install.packages(\"MASS\")\nif (!require(ggplot2)) install.packages(\"ggplot2\")\nlibrary(MASS)\nlibrary(ggplot2)\n\n# ===========================================================================\n# Function: fit_beta_and_lognormal\n# (Fits both distributions and plots them with extended axes)\n# ===========================================================================\nfit_beta_and_lognormal &lt;- function(data, min_val = 0, max_val = 5) {\n  \n  # --- 0. VALIDATION CHECK ---\n  # Check if any data points fall outside the specified range\n  if (any(data &lt; min_val | data > max_val)) {\n    message(\"Error: Data contains values outside the range &#91;\", min_val, \", \", max_val, \"]. Update your assumed min &amp; max. Exiting function.\")\n    message(list(data&#91;data &lt; min_val | data > max_val]))\n    return(NULL) # Exits the function early\n  }\n  \n  # --- 1. PREPARE DATA ---\n  # Since we validated above, data_clean will be the same as data, \n  # but we'll keep the variable name for consistency with your original logic.\n  data_clean &lt;- data \n  epsilon &lt;- 1e-6 # Nudge for boundaries\n  \n  # --- 2. FIT BETA (Red) ---\n  # Normalize to &#91;0, 1]\n  data_norm &lt;- (data_clean - min_val) \/ (max_val - min_val)\n  data_norm&#91;data_norm &lt;= 0] &lt;- epsilon\n  data_norm&#91;data_norm >= 1] &lt;- 1 - epsilon\n  \n  fit_beta &lt;- fitdistr(data_norm, \"beta\", start = list(shape1 = 1, shape2 = 1))\n  alpha_est &lt;- fit_beta$estimate&#91;\"shape1\"]\n  beta_est &lt;- fit_beta$estimate&#91;\"shape2\"]\n  \n  # --- 3. FIT LOG-NORMAL (Blue) ---\n  data_lnorm &lt;- data_clean\n  data_lnorm&#91;data_lnorm &lt;= 0] &lt;- epsilon \n  \n  fit_lnorm &lt;- fitdistr(data_lnorm, \"lognormal\")\n  meanlog_est &lt;- fit_lnorm$estimate&#91;\"meanlog\"]\n  sdlog_est &lt;- fit_lnorm$estimate&#91;\"sdlog\"]\n  \n  # --- 4. PLOTTING ---\n  df &lt;- data.frame(val = data_clean)\n  \n  p &lt;- ggplot(df, aes(x = val)) +\n    # Histogram\n    geom_histogram(aes(y = after_stat(density)), bins = 40, fill = \"lightgray\", color = \"white\") +\n    \n    # Beta Curve (RED)\n    stat_function(fun = function(x) {\n      dbeta((x - min_val) \/ (max_val - min_val), alpha_est, beta_est) \/ (max_val - min_val)\n    }, aes(colour = \"Beta (Red)\"), linewidth = 1.2) +\n    \n    # Log-Normal Curve (BLUE)\n    stat_function(fun = function(x) {\n      dlnorm(x, meanlog_est, sdlog_est)\n    }, aes(colour = \"Log-Normal (Blue)\"), linewidth = 1.2) +\n    \n    # Styling\n    scale_x_continuous(limits = c(min_val, max_val)) +\n    scale_colour_manual(name = \"Fitted Models\", \n                        values = c(\"Beta (Red)\" = \"red\", \"Log-Normal (Blue)\" = \"blue\")) +\n    labs(title = paste(\"Fit Comparison &#91;\", min_val, \"-\", max_val, \"]\"),\n         subtitle = \"Red = Beta Fit | Blue = Log-Normal Fit\",\n         x = \"Value\", y = \"Density\") +\n    theme_minimal() +\n    theme(legend.position = \"top\")\n  \n  print(p)\n}\n\n# ===========================================================================\n# Usage Example with BETA Data\n# ===========================================================================\nset.seed(42)\n\n# 1. Define Range\nmy_min &lt;- 0\nmy_max &lt;- 5\n\n# 2. Generate Dummy Data using rbeta (The \"True\" Source)\n# We use shape1=2 and shape2=5, then scale it to 0-5\nobserved_data &lt;- (rbeta(n = 10, shape1=2, shape2=5) * (my_max - my_min)) + my_min\n\n# 3. Run the Fitting Function\nfit_beta_and_lognormal(observed_data, min_val = my_min, max_val = my_max)\n<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Like flat-earthers, standard statistical models can sometimes be a little detached from reality. If you work in occupational hygiene, you\u2019re likely intimately familiar with the Normal and Lognormal distributions. They are the bread and butter of our industry, but they have a specific quirk that drives me nuts: they allow for &#8220;infinite tails.&#8221; Mathematically, this [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32],"tags":[31],"class_list":["post-185","post","type-post","status-publish","format-standard","hentry","category-statistics","tag-stats"],"_links":{"self":[{"href":"https:\/\/yoyodyne.com.au\/index.php\/wp-json\/wp\/v2\/posts\/185","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/yoyodyne.com.au\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/yoyodyne.com.au\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/yoyodyne.com.au\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/yoyodyne.com.au\/index.php\/wp-json\/wp\/v2\/comments?post=185"}],"version-history":[{"count":10,"href":"https:\/\/yoyodyne.com.au\/index.php\/wp-json\/wp\/v2\/posts\/185\/revisions"}],"predecessor-version":[{"id":198,"href":"https:\/\/yoyodyne.com.au\/index.php\/wp-json\/wp\/v2\/posts\/185\/revisions\/198"}],"wp:attachment":[{"href":"https:\/\/yoyodyne.com.au\/index.php\/wp-json\/wp\/v2\/media?parent=185"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/yoyodyne.com.au\/index.php\/wp-json\/wp\/v2\/categories?post=185"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/yoyodyne.com.au\/index.php\/wp-json\/wp\/v2\/tags?post=185"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}