{"id":2419,"date":"2017-08-24T17:32:53","date_gmt":"2017-08-24T17:32:53","guid":{"rendered":"https:\/\/www.kolabtree.com\/blog\/?p=2419"},"modified":"2019-01-02T11:52:17","modified_gmt":"2019-01-02T11:52:17","slug":"fishers-exact-test-statistical-relationships","status":"publish","type":"post","link":"https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/","title":{"rendered":"Using Fisher\u2019s Exact Test for Small Sample Contingency Tables"},"content":{"rendered":"<p><em>This post is authored by <a href=\"https:\/\/www.kolabtree.com\/find-an-expert\/paul-ricci\/?utm_source=Blog_Fisher\">Paul Ricci<\/a>, a Kolabtree expert. It originally appeared in his column on <a href=\"http:\/\/datadrivenjournalism.net\/news_and_analysis\/using_fishers_exact_test_to_unearth_stories_about_statistical_relationships\">Data Driven Journalism<\/a>.<\/em><\/p>\n<p>This article outlines how Fisher&#8217;s exact test can be used for small sample contingency tables. A common problem in<a href=\"https:\/\/www.kolabtree.com\/find-an-expert\/subject\/data-analysis?utm_source=Blog_Fisher\"> data analysis<\/a> is how to determine if there is a statistical relationship between two categorical variables such as gender, race, or the share of the vote for two candidates in an election.\u00a0 The simplest way to visualize the relationship is to represent the counts for each combination of two variables in a contingency table with the rows representing the levels of one variable and the columns representing the levels of the other variable.\u00a0 The most commonly used statistical test for an association between the row and column variables is the chi-square (\u03c7<sup>2<\/sup>) test.\u00a0 The example in the table below is given to illustrate the test.<\/p>\n<table>\n<tbody>\n<tr>\n<td rowspan=\"2\" width=\"158\"><\/td>\n<td colspan=\"2\" width=\"320\">Democrat Winner (% of column)<\/td>\n<td rowspan=\"2\" width=\"145\">Total<\/td>\n<\/tr>\n<tr>\n<td width=\"159\">Clinton Win<\/td>\n<td width=\"161\">Sanders win<\/td>\n<\/tr>\n<tr>\n<td width=\"158\">Trump 1st<\/td>\n<td width=\"159\">25 (86%)<\/td>\n<td width=\"161\">12 (55%)<\/td>\n<td width=\"145\">37<\/td>\n<\/tr>\n<tr>\n<td width=\"158\">Trump 2nd<\/td>\n<td width=\"159\">3 (11%)<\/td>\n<td width=\"161\">8 (36%)<\/td>\n<td width=\"145\">11<\/td>\n<\/tr>\n<tr>\n<td width=\"158\">Trump 3rd<\/td>\n<td width=\"159\">1 (3%)<\/td>\n<td width=\"161\">2 (9%)<\/td>\n<td width=\"145\">3<\/td>\n<\/tr>\n<tr>\n<td width=\"158\">Total<\/td>\n<td width=\"159\">29 (100%)<\/td>\n<td width=\"161\">22 (100%)<\/td>\n<td width=\"145\">51<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>The columns in the above table shows the primary states won by Hillary Clinton and by Bernie Sanders on the Democratic side and Donald Trump placed in the same primary states on the Republican side. The total number of states in the table is 51 because the District of Columbia is included. The column percent\u2019s show that Trump won 86% of the primary states that Clinton won while he won 55% of the states that Sanders won.<\/p>\n<p>The chi-square test is based on calculating expected values for each cell in the table. For example, the expected value (the value for the cell which one would expect to see if there were no relationship mong the variables) for the cell for states where Trump finished third on the Republican side and for states where Bernie Sanders won on the Democratic side would be computed by multiplying the row total for where Trump placed third (3) by the column total for states where Sanders won (22). This product is then divided by the total number of observations for (51). The formula for the expected value is given by:<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-large wp-image-2422\" src=\"https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/expected-value-1024x74.png\" alt=\"\" width=\"702\" height=\"51\" srcset=\"https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/expected-value-1024x74.png 1024w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/expected-value-300x22.png 300w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/expected-value-768x56.png 768w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/expected-value-1536x112.png 1536w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/expected-value-2048x149.png 2048w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/expected-value-1080x79.png 1080w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/expected-value-300x22@2x.png 600w\" sizes=\"(max-width: 702px) 100vw, 702px\" \/><\/p>\n<p>That means that for this cell a value of 1.29 would be expected if the primary states where Trump finished third and Sanders won were completely independent of each other.\u00a0 The observed value for this cell is 2 suggesting a higher count for this cell than would be expected.\u00a0 Expected values would be computed for each cell in the table and the difference between the observed and expected values for each cell is computed, squared, divided by the expected value, and summed across the cells in the table according to the formula:<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-large wp-image-2423\" src=\"https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/chi-square-1024x87.png\" alt=\"\" width=\"702\" height=\"60\" srcset=\"https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/chi-square-1024x87.png 1024w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/chi-square-300x25.png 300w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/chi-square-768x65.png 768w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/chi-square-1536x130.png 1536w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/chi-square-2048x174.png 2048w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/chi-square-1080x92.png 1080w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/chi-square-300x25@2x.png 600w\" sizes=\"(max-width: 702px) 100vw, 702px\" \/><\/p>\n<p>If the value for the chi-square exceeds the chi-square critical value for a given degree of freedom (found by multiplying the number of rows minus one and the number of columns minus one) and p-value, it is concluded that there is an association between the variables.<\/p>\n<p>There is a problem with the chi-square test.\u00a0 It is an approximation of the distribution of counts in contingency tables.\u00a0 If more than 20% of the cells in the table have an expected value of less than five, the chi-square approximation does not work to test the hypothesis of an association between the row variable and the column variable (as is the case in the table below).\u00a0 Both variables in the table are categorical.\u00a0 The major statistical packages will alert the user if this assumption is violated.\u00a0 Violating the assumption causes the observed p-value to be incorrect and can lead to incorrect conclusions being made regarding the presence or absence of an association.\u00a0 There is an exact alternative to the chi-square test called Fisher\u2019s exact test.<\/p>\n<p><a href=\"http:\/\/mathworld.wolfram.com\/FishersExactTest.html\">Fisher\u2019s exact test is based on the hypergeometric probability distribution.<\/a><\/p>\n<p><img decoding=\"async\" loading=\"lazy\" class=\"alignnone size-full wp-image-2424\" src=\"https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/hypergeometric.png\" alt=\"\" width=\"253\" height=\"41\" \/><\/p>\n<p>Here the <em>R<sub>i<\/sub>!<\/em> are the factorials of the row totals (5!=5*4*3*2*1), <em>C<sub>i<\/sub>!<\/em> are the factorials of the individual column totals, <em>N!<\/em> is the factorial of the table total and the a<sub>ij<\/sub>! are the factorials for the individual cell values.\u00a0 The \u03a0<sub>ij <\/sub>is the product coefficient of the individual cell values.\u00a0 Such a formula is even more computationally intensive than the chi-square test, especially for tables with many rows and columns.\u00a0 This is why the chi-square test was favored in the past because it took too much memory for computers to run.\u00a0 These days it is less of an issue for computers to run the Fisher\u2019s exact test and it is easy to run in the major statistical packages (R, SAS, SPSS, STATA, etc.).<\/p>\n<p>The commands to conduct Fisher\u2019s exact test and the chi-square test in R (a free program) can be seen below for the table at the top of the article with the corresponding output (yellow for Fisher\u2019s exact test, green for the chi-square test).<\/p>\n<p><strong><img decoding=\"async\" loading=\"lazy\" class=\"alignleft wp-image-2444 size-large\" src=\"https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/fishersexact-1024x413.png\" alt=\"\" width=\"702\" height=\"283\" srcset=\"https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/fishersexact-1024x413.png 1024w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/fishersexact-300x121.png 300w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/fishersexact-768x310.png 768w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/fishersexact-1080x436.png 1080w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/fishersexact.png 1458w, https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/fishersexact-300x121@2x.png 600w\" sizes=\"(max-width: 702px) 100vw, 702px\" \/><\/strong><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>The output for the Fisher&#8217;s exact test shows that there is a probability of 0.03653 of observing these table frequencies when there is no association between the rows and columns.\u00a0 The chi-square test output shows a probability of 0.04217 for a relationship in the same table.\u00a0 If we were using the .05 p value as the criteria for significance we would find a relationship for both tests in this case though the p-values differ.\u00a0 States which Hillary Clinton won in the primary season were more likely to be won by Donald Trump while states where Bernie Sanders won were more likely to have Trump finish 2<sup>nd<\/sup> or 3<sup>rd<\/sup> In tables with even smaller sample sizes the difference between the p-values may be even greater leading to radically different conclusions.<\/p>\n<p>As a warning the p-value should not be used as an indicator of the strength of the association between categorical variables.\u00a0 Either the test is significant or not.\u00a0 The p-value is sensitive to sample size.\u00a0 Often the odds ratio is used to estimate the effect size but R only computes it in the fisher.test function for tables with 2 columns and 2 rows.<\/p>\n<p>Fisher\u2019s exact test provides a criterion for deciding whether the differences in observed percentages between two categorical variables in a sample are significant or just due to random noise in the data.\u00a0 In the above example, the 86% of primary states won by Clinton and Trump are significantly different from the 55% of primary won by Sanders and Trump.\u00a0 Journalists should always be careful about making these judgments by just looking at observed percentages or counts because of the subjectivity of such decisions.\u00a0 Subjective decisions can be further clouded by ones preconceived notions about the issues related to the data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post is authored by Paul Ricci, a Kolabtree expert. It originally appeared in his column on Data Driven Journalism. This article outlines how Fisher&#8217;s exact test can be used for small sample contingency tables. A common problem in data analysis is how to determine if there is a statistical relationship between two categorical variables<\/p>\n<div class=\"read-more\"><a href=\"https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/\" title=\"Read More\">Read More<\/a><\/div>\n","protected":false},"author":31,"featured_media":2445,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[398,247,433],"tags":[],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v20.1 (Yoast SEO v20.1) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Fisher&#039;s exact test for determining statistical relationships<\/title>\n<meta name=\"description\" content=\"Using Fisher&#039;s exact test in R to determine the relationship between two variables in a contingency table - an example using the American elections.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Using Fisher\u2019s Exact Test for Small Sample Contingency Tables\" \/>\n<meta property=\"og:description\" content=\"Using Fisher&#039;s exact test in R to determine the relationship between two variables in a contingency table - an example using the American elections.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/\" \/>\n<meta property=\"og:site_name\" content=\"The Kolabtree Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/kolabtree\" \/>\n<meta property=\"article:published_time\" content=\"2017-08-24T17:32:53+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-01-02T11:52:17+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/business-2089533_1920.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1222\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Paul Ricci\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CSIwoDB\" \/>\n<meta name=\"twitter:site\" content=\"@kolabtree\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Paul Ricci\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Fisher's exact test for determining statistical relationships","description":"Using Fisher's exact test in R to determine the relationship between two variables in a contingency table - an example using the American elections.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/","og_locale":"en_US","og_type":"article","og_title":"Using Fisher\u2019s Exact Test for Small Sample Contingency Tables","og_description":"Using Fisher's exact test in R to determine the relationship between two variables in a contingency table - an example using the American elections.","og_url":"https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/","og_site_name":"The Kolabtree Blog","article_publisher":"https:\/\/www.facebook.com\/kolabtree","article_published_time":"2017-08-24T17:32:53+00:00","article_modified_time":"2019-01-02T11:52:17+00:00","og_image":[{"width":1920,"height":1222,"url":"https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/business-2089533_1920.jpg","type":"image\/jpeg"}],"author":"Paul Ricci","twitter_card":"summary_large_image","twitter_creator":"@CSIwoDB","twitter_site":"@kolabtree","twitter_misc":{"Written by":"Paul Ricci","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/#article","isPartOf":{"@id":"https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/"},"author":{"name":"Paul Ricci","@id":"https:\/\/www.kolabtree.com\/blog\/#\/schema\/person\/d3ae828656a4c84a3a1b7cdba371820f"},"headline":"Using Fisher\u2019s Exact Test for Small Sample Contingency Tables","datePublished":"2017-08-24T17:32:53+00:00","dateModified":"2019-01-02T11:52:17+00:00","mainEntityOfPage":{"@id":"https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/"},"wordCount":1085,"commentCount":0,"publisher":{"@id":"https:\/\/www.kolabtree.com\/blog\/#organization"},"articleSection":["Data Science","Guest posts","Tech"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/","url":"https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/","name":"Fisher's exact test for determining statistical relationships","isPartOf":{"@id":"https:\/\/www.kolabtree.com\/blog\/#website"},"datePublished":"2017-08-24T17:32:53+00:00","dateModified":"2019-01-02T11:52:17+00:00","description":"Using Fisher's exact test in R to determine the relationship between two variables in a contingency table - an example using the American elections.","breadcrumb":{"@id":"https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.kolabtree.com\/blog\/fishers-exact-test-statistical-relationships\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.kolabtree.com\/blog\/"},{"@type":"ListItem","position":2,"name":"Using Fisher\u2019s Exact Test for Small Sample Contingency Tables"}]},{"@type":"WebSite","@id":"https:\/\/www.kolabtree.com\/blog\/#website","url":"https:\/\/www.kolabtree.com\/blog\/","name":"The Kolabtree Blog","description":"Expert Views on Science, Innovation and Product Development","publisher":{"@id":"https:\/\/www.kolabtree.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.kolabtree.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.kolabtree.com\/blog\/#organization","name":"Kolabtree","url":"https:\/\/www.kolabtree.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.kolabtree.com\/blog\/#\/schema\/logo\/image\/","url":"","contentUrl":"","caption":"Kolabtree"},"image":{"@id":"https:\/\/www.kolabtree.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/kolabtree","https:\/\/twitter.com\/kolabtree","https:\/\/instagram.com\/kolabtree","https:\/\/www.linkedin.com\/company\/kolabtree","https:\/\/en.m.wikipedia.org\/wiki\/Kolabtree"]},{"@type":"Person","@id":"https:\/\/www.kolabtree.com\/blog\/#\/schema\/person\/d3ae828656a4c84a3a1b7cdba371820f","name":"Paul Ricci","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.kolabtree.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/Race-4-the-Cure-006-96x96.jpg","contentUrl":"https:\/\/www.kolabtree.com\/blog\/wp-content\/uploads\/2017\/08\/Race-4-the-Cure-006-96x96.jpg","caption":"Paul Ricci"},"description":"Paul Ricci is a statistician, neuropsychologist and data analyst based in the USA. He writes a regular column for the website Data Driven Journalism and has an MA in Research Methodology and Neuroscience, and an MS in Biostatistics.","sameAs":["https:\/\/csiwodeadbodies.blogspot.com\/","https:\/\/twitter.com\/@CSIwoDB"],"url":"https:\/\/www.kolabtree.com\/blog\/author\/paul-ricci\/"}]}},"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.kolabtree.com\/blog\/wp-json\/wp\/v2\/posts\/2419"}],"collection":[{"href":"https:\/\/www.kolabtree.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kolabtree.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kolabtree.com\/blog\/wp-json\/wp\/v2\/users\/31"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kolabtree.com\/blog\/wp-json\/wp\/v2\/comments?post=2419"}],"version-history":[{"count":6,"href":"https:\/\/www.kolabtree.com\/blog\/wp-json\/wp\/v2\/posts\/2419\/revisions"}],"predecessor-version":[{"id":3648,"href":"https:\/\/www.kolabtree.com\/blog\/wp-json\/wp\/v2\/posts\/2419\/revisions\/3648"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.kolabtree.com\/blog\/wp-json\/wp\/v2\/media\/2445"}],"wp:attachment":[{"href":"https:\/\/www.kolabtree.com\/blog\/wp-json\/wp\/v2\/media?parent=2419"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kolabtree.com\/blog\/wp-json\/wp\/v2\/categories?post=2419"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kolabtree.com\/blog\/wp-json\/wp\/v2\/tags?post=2419"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}