Skip to contents

Introduction

The edfinr package provides a simple, consistent interface for accessing comprehensive education finance data for U.S. school districts. This vignette will help you get started with the package’s core functionality.

Core Function: get_finance_data()

The primary function in edfinr is get_finance_data(), which provides access to comprehensive school finance data from school years 2011-12 through 2021-22. The function combines data from multiple sources:

  • Financial data: Revenue and expenditure data from the National Center for Education Statistics (NCES) version of the F-33 survey.
  • Enrollment: Student counts from NCES Common Core of Data.
  • Demographics: Poverty estimates from the U.S. Census Bureau Small Area Income and Poverty Estimates (SAIPE).
  • Community characteristics: Income and education data from American Community Survey (ACS).
  • Inflation adjustments: Consumer Price Index for All Urban Consumers (CPI-U) data for constant dollar calculations.

Basic Usage

The simplest way to use get_finance_data() is to specify a year and state. For example, to get finance data for Kentucky school districts from the 2015-16 school year:

# download "skinny" dataset for a single year and a single staet
ky_sy16 <- get_finance_data(yr = 2016, geo = "KY")

# view the structure of the returned data
glimpse(ky_sy16)
## Rows: 173
## Columns: 41
## $ ncesid          <chr> "2100030", "2100070", "2100081", "2100090", "2100120",
## $ year            <chr> "2016", "2016", "2016", "2016", "2016", "2016", "2016"…
## $ state           <chr> "KY", "KY", "KY", "KY", "KY", "KY", "KY", "KY", "KY", 
## $ dist_name       <chr> "Adair County", "Allen County", "Muhlenberg County", "…
## $ enroll          <dbl> 2689, 3071, 5023, 363, 3811, 3265, 273, 1371, 718, 277…
## $ rev_total_pp    <dbl> 10622.908, 9635.949, 11004.977, 20374.656, 9874.574, 1…
## $ rev_local_pp    <dbl> 1922.276, 1924.129, 3204.260, 14935.863, 3189.452, 270…
## $ rev_state_pp    <dbl> 7046.858, 6523.608, 6721.481, 5081.578, 5700.866, 6064…
## $ rev_fed_pp      <dbl> 1653.7746, 1188.2123, 1079.2355, 357.2145, 984.2561, 1…
## $ rev_total       <dbl> 28565000, 29592000, 55278000, 7396000, 37632000, 34758…
## $ rev_local       <dbl> 5169000.0, 5909000.0, 16095000.0, 5421718.4, 12155000.…
## $ rev_state       <dbl> 18949000, 20034000, 33762000, 1844613, 21726000, 19799…
## $ rev_fed         <dbl> 4447000.0, 3649000.0, 5421000.0, 129668.9, 3751000.0, 
## $ rev_total_unadj <dbl> 30303000, 31613000, 58033000, 7621000, 38072000, 36802…
## $ rev_local_unadj <dbl> 5169000, 6166000, 16125000, 5561000, 12156000, 8970000…
## $ rev_state_unadj <dbl> 20687000, 21798000, 36487000, 1927000, 22165000, 21685…
## $ rev_fed_unadj   <dbl> 4447000, 3649000, 5421000, 133000, 3751000, 6147000, 3…
## $ exp_cur_pp      <dbl> 9461.510, 8820.580, 9074.457, 18002.755, 8372.343, 999…
## $ rev_exp_pp_diff <dbl> 1161.3983, 815.3696, 1930.5196, 2371.9008, 1502.2304, 
## $ exp_cur_st_loc  <dbl> 22888000, 25129000, 39656000, 6443000, 29720000, 28532…
## $ exp_cur_fed     <dbl> 2554000, 1959000, 5925000, 92000, 2187000, 4090000, 21…
## $ exp_cur_resa    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
## $ exp_cur_total   <dbl> 25442000, 27088000, 45581000, 6535000, 31907000, 32622…
## $ cpi_sy12        <dbl> 1.047057, 1.047057, 1.047057, 1.047057, 1.047057, 1.04…
## $ mhi             <dbl> 33873, 41166, 40445, 162188, 53513, 39526, 34871, 4392…
## $ mpv             <dbl> 86100, 98200, 81400, 604100, 136300, 96500, 107300, 10…
## $ adult_pop       <dbl> 12753, 13827, 21649, 1372, 14970, 14736, 902, 5766, 17…
## $ ba_plus_pop     <dbl> 2103, 1885, 2627, 1133, 2835, 3419, 158, 880, 478, 139…
## $ ba_plus_pct     <dbl> 0.16490238, 0.13632748, 0.12134510, 0.82580175, 0.1893…
## $ total_pop       <dbl> 19280, 20631, 31028, 2531, 22158, 21143, 1423, 8054, 3…
## $ student_pop     <dbl> 2928, 3633, 4768, 433, 3937, 3253, 249, 1340, 434, 239…
## $ stpov_pop       <dbl> 967, 905, 1130, 23, 536, 894, 81, 287, 154, 530, 1288,
## $ stpov_pct       <dbl> 0.33025956, 0.24910542, 0.23699664, 0.05311778, 0.1361…
## $ cong_dist       <int> 2101, 2101, 2101, 2103, 2106, 2104, 2104, 2101, 2105, 
## $ state_leaid     <chr> "KY-001001000", "KY-002005000", "KY-089445000", "KY-05…
## $ county          <chr> "Adair County", "Allen County", "Muhlenberg County", "…
## $ cbsa            <chr> "N", "14540", "N", "31140", "23180", "26580", "17140",
## $ urbanicity      <fct> Town, Rural, Rural, Suburb, Town, City, Rural, Rural, 
## $ schlev          <chr> "03", "03", "03", "01", "03", "03", "03", "03", "03", 
## $ lea_type        <fct> Regular public school district that is not a component…
## $ lea_type_id     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 

Dataset Types: Skinny vs. Full

By default, get_finance_data() returns a “skinny” dataset with 41 essential variables covering: - District identifiers and characteristics. - Total revenues by source (local, state, federal). - Current expenditures. - Key demographic and economic indicators.

For more detailed analysis, you can request the “full” dataset with 89 variables that includes: - All skinny dataset variables. - Detailed expenditure data. - Data on spending of temporary pandemic-related federal funding.

# download the full dataset with detailed expenditure data for a single year/state
ky_full_sy16 <- get_finance_data(yr = "2016", geo = "KY", dataset_type = "full")

# view additional variables in "full" dataset
names(ky_full_sy16)[42:89]
##  [1] "exp_emp_salary"             "exp_emp_bene"              
##  [3] "exp_textbooks"              "exp_utilities"             
##  [5] "exp_tech_supp"              "exp_tech_equip"            
##  [7] "exp_pay_private_sch"        "exp_pay_charter_sch"       
##  [9] "exp_pay_other_lea"          "exp_other_sys_pay"         
## [11] "exp_instr_total"            "exp_instr_sal"             
## [13] "exp_instr_bene"             "exp_supp_stu_total"        
## [15] "exp_supp_stu_sal"           "exp_supp_stu_bene"         
## [17] "exp_supp_instr_total"       "exp_supp_instr_sal"        
## [19] "exp_supp_instr_bene"        "exp_supp_gen_admin_total"  
## [21] "exp_supp_gen_admin_sal"     "exp_supp_gen_admin_bene"   
## [23] "exp_supp_sch_admin_total"   "exp_supp_sch_admin_sal"    
## [25] "exp_supp_sch_admin_bene"    "exp_supp_ops_total"        
## [27] "exp_supp_ops_sal"           "exp_supp_ops_bene"         
## [29] "exp_supp_trans_total"       "exp_supp_trans_sal"        
## [31] "exp_supp_trans_bene"        "exp_central_serv_total"    
## [33] "exp_central_serv_sal"       "exp_central_serv_bene"     
## [35] "exp_noninstr_food_total"    "exp_noninstr_food_sal"     
## [37] "exp_noninstr_food_bene"     "exp_noninstr_ent_ops_total"
## [39] "exp_noninstr_ent_ops_bene"  "exp_noninstr_other"        
## [41] "exp_covid_total"            "exp_covid_instr"           
## [43] "exp_covid_supp"             "exp_covid_cap_out"         
## [45] "exp_covid_tech_supp"        "exp_covid_tech_equip"      
## [47] "exp_covid_supp_plant"       "exp_covid_food"

Multiple Years and States

The get_finance_data() function makes it easy to access data across multiple years and states:

# get data for multiple states across multiple years
sec_data <- get_finance_data(
  yr = "2018:2022",  # years 2018 through 2022
  geo = "AL,AR,FL,GA,KY,LA,MS,MO,OK,SC,TN,TX"  # comma-separated state codes
)

# get the most recent year of data for all states
us_sy22 <- get_finance_data(yr = 2022, geo = "all")

Working with the Data

Once you’ve retrieved the data, you can use standard data manipulation tools to analyze it. Here are some common analysis patterns:

Analyze Local vs. Total Revenue Per-Pupil

# download 2022 data for connecticut
ct_sy22 <- get_finance_data(yr = "2022", geo = "CT")

# plot local revenue vs. total revenue w/ urbanicity + enrollment
ggplot(ct_sy22) +
  geom_point(aes(
    x = rev_local_pp, 
    y = rev_total_pp,
    color = urbanicity,
    size = enroll),
    alpha = .6) +
  scale_size_area(
    max_size = 10,
    labels = scales::label_comma()
    ) +    
  scale_x_continuous(labels = scales::label_dollar()) +
  scale_y_continuous(labels = scales::label_dollar()) +
  labs(
    title = "Connecticut Districts' Local vs. Total Revenue Per-Pupil, SY2021-22",
    x = "Local Revenue Per-Pupil", 
    y = "Total Revenue Per-Pupil", 
    size = "Enrollment", 
    color = "Urbanicity") +
  theme_bw()

Analyzing Revenue Sources by Urbanicity

# compare revenue sources across districts
revenue_analysis <- ct_sy22 |>
  mutate(
    pct_local = rev_local / rev_total,
    pct_state = rev_state / rev_total,
    pct_federal = rev_fed / rev_total
  ) |>
  select(dist_name, urbanicity, enroll, pct_local, pct_state, pct_federal) |>
  group_by(urbanicity) |>
  summarize(
    avg_pct_local = mean(pct_local, na.rm = TRUE),
    avg_pct_state = mean(pct_state, na.rm = TRUE),
    avg_pct_federal = mean(pct_federal, na.rm = TRUE),
    n_districts = n(),
    enrollment = sum(enroll, na.rm = TRUE)
  )

print(revenue_analysis)
## # A tibble: 4 × 6
##   urbanicity avg_pct_local avg_pct_state avg_pct_federal n_districts enrollment
##   <fct>              <dbl>         <dbl>           <dbl>       <int>      <dbl>
## 1 City               0.250         0.608          0.143           29     126751
## 2 Suburb             0.649         0.289          0.0623          85     285922
## 3 Town               0.584         0.351          0.0649           8      14823
## 4 Rural              0.649         0.303          0.0490          65      52477

Additional Resources

For more information about the data and methods used in this package:

  • Use list_variables() to see all available variables and their descriptions.
  • Use get_states() to see valid state codes.
  • See the “CPI Adjustments” vignette for information about inflation adjustments.
  • See the “Data Sources and Methods” vignette for detailed methodology.