medicaid_utils.preprocessing package
Submodules
medicaid_utils.preprocessing.max_cc module
This module has MAXCC class which wraps together cleaning/ preprocessing routines specific for MAX CC files
- class medicaid_utils.preprocessing.max_cc.MAXCC(year: int, state: str, data_root: str, index_col: str = 'BENE_MSIS', clean: bool = True, preprocess: bool = True, pq_engine: str = 'pyarrow')[source]
Bases:
MAXFile
Scripts to preprocess CC file
medicaid_utils.preprocessing.max_file module
This module has MAXFile class from which is the base class for all MAX file type classes
- class medicaid_utils.preprocessing.max_file.MAXFile(ftype: str, year: int, state: str, data_root: str, index_col: str = 'BENE_MSIS', clean: bool = True, preprocess: bool = True, tmp_folder: str | None = None, pq_engine: str = 'pyarrow')[source]
Bases:
object
Parent class for all MAX file classes, each of which will have clean and preprocess functions
- add_gender() None [source]
Adds integer ‘female’ column based on ‘EL_SEX_CD’ column. Undefined values (‘U’) in EL_SEX_CD column will result in female column taking the value -1
- cache_results(repartition=False)[source]
Save results in intermediate steps of some lengthy processing. Saving intermediate results speeds up processing
- Parameters:
repartition (bool, default=False) – Repartition the dask dataframe
- calculate_payment()[source]
Calculates payment amount New Column(s):
pymt_amt - “MDCD_PYMT_AMT” + “TP_PYMT_AMT”
- clean_diag_codes()[source]
Clean diagnostic code columns by removing non-alphanumeric characters and converting them to upper case
- clean_proc_codes()[source]
Clean diagnostic code columns by removing non-alphanumeric characters and converting them to upper case
- flag_ed_use() None [source]
Detects ed use in claims New Column(s):
- ed_cpt - 0 or 1, Claim has a procedure billed in ED code range (99281–99285)
(PRCDR_CD_SYS_{1-6} == 01 & PRCDR_CD_{1-6} in (99281–99285))
ed_ub92 - 0 or 1, Claim has a revenue center codes (0450 - 0459, 0981) - UB_92_REV_CD_GP_{1-23}
ed_tos - 0 or 1, Claim has an outpatient type of service (MAX_TOS = 11) (if ftype == ‘ip’)
ed_pos - 0 or 1, Claim has PLC_OF_SRVC_CD set to 23 (if ftype == ‘ot’)
ed_use - any of ed_cpt, ed_ub92, ed_tos or ed_pos is 1
any_ed - 0 or 1, 1 when any other claim from the same visit has ed_use set to 1 (if ftype == ‘ot’)
- Uses the below as reference:
If the patient is a Medicare beneficiary, the general surgeon should bill the level of
ED code (99281–99285) (http://bulletin.facs.org/2013/02/coding-for-hospital-admission) - Inpatient files: Revenue Center Codes 0450-0459, 0981 (https://www.resdac.org/resconnect/articles/144)
- classmethod get_claim_instance(claim_type, *args, **kwargs)[source]
Returns an instance of the requested claim type
- pq_export(dest_path_and_fname)[source]
Export parquet files (overwrite safe)
- Parameters:
dest_path_and_fname (str) – Destination path
- process_date_cols()[source]
Convert datetime columns to datetime type and add basic date based constructed variables
New columns:
birth_year, birth_month, birth_day - date compoments of EL_DOB (date of birth)
birth_date - birth date (EL_DOB)
death - 0 or 1, if EL_DOD or MDCR_DOD is not empty and falls in the claim year or before
age - age in years, integer format
age_decimal - age in years, with decimals
age_day - age in days
adult - 0 or 1, 1 when patient’s age is in [18,115]
child - 0 or 1, 1 when patient’s age is in [0,17]
- if ftype == ‘ip’:
Clean/ impute admsn_date and add ip duration related columns New column(s):
admsn_date - Admission date (ADMSN_DT)
srvc_bgn_date - Service begin date (SRVC_BGN_DT)
srvc_end_date - Service end date (SRVC_END_DT)
prncpl_proc_date - Principal procedure date (PRNCPL_PRCDR_DT)
missing_admsn_date - 0 or 1, 1 when admission date is missing
missing_prncpl_proc_date - 0 or 1, 1 when principal procedure date is missing
flag_admsn_miss - 0 or 1, 1 when admsn_date was imputed
los - ip service duration in days
ageday_admsn - age in days as on admsn_date
age_admsn - age in years, with decimals, as on admsn_date
age_prncpl_proc - age in years as on principal procedure date
age_day_prncpl_proc - age in days as on principal procedure date
- if ftype == ‘ot’:
Adds duration column, provided service end and begin dates are clean New Column(s):
srvc_bgn_date - Service begin date (SRVC_BGN_DT)
srvc_end_date - Service end date (SRVC_END_DT)
diff & duration - duration of service in days
age_day_srvc_bgn - age in days as on service begin date
age_srvc_bgn - age in years, with decimals, as on service begin date
- if ftype == ‘ps:
New Column(s):
date_of_death - Date of death (EL_DOD)
medicare_date_of_death - Medicare date of death (MDCR_DOD)
medicaid_utils.preprocessing.max_ip module
This module has MAXIP class which wraps together cleaning/ preprocessing routines specific for MAX IP files
- class medicaid_utils.preprocessing.max_ip.MAXIP(year: int, state: str, data_root: str, index_col: str = 'BENE_MSIS', clean: bool = True, preprocess: bool = True, tmp_folder: str | None = None, pq_engine: str = 'pyarrow')[source]
Bases:
MAXFile
- flag_ip_overlaps()[source]
Identifies duplicate/ overlapping claims. When several/ overlapping claims exist with the same MSIS_ID, claim with the largest payment amount is retained. New Column(s):
flag_ip_undup - 0 or 1, 1 when row is not a duplicate flag_ip_dup_drop - 0 or 1, 1 when row is duplicate and must be dropped flag_ip_overlap_drop - 0 or 1, 1 when row overlaps with another claim ip_incl - 0 or 1, 1 when row is clean (flag_ip_dup_drop = 0 & flag_ip_overlap_drop = 0) and has los > 0
- Parameters:
dd.DataFrame (df) –
- Return type:
None
medicaid_utils.preprocessing.max_ot module
This module has MAXOT class which wraps together cleaning/ preprocessing routines specific for MAX OT files
- class medicaid_utils.preprocessing.max_ot.MAXOT(year: int, state: str, data_root: str, index_col: str = 'BENE_MSIS', clean: bool = True, preprocess: bool = True, tmp_folder=None, pq_engine: str = 'pyarrow')[source]
Bases:
MAXFile
- add_ot_flags()[source]
Assign flags for IP, OT and ED calculation Based on hierarchical principal: IP first,then ED, and then OT Marks claims that have overlapping IP claims, has ED services or have only OT services New Column(s):
ip_incl - 0 or 1, 1 when has no dental and transport claims, and has an overlapping IP claim ed_incl - 0 or 1, 1 when has no dental and transport claims, has no overlapping IP claim, and has an ED
service in any visits corresponding this claim
- ot_incl - 0 or 1, 1 when has no dental and transport claims, has no overlapping IP claim, and has no ED
service in any visits corresponding this claim
flag_drop - 0 or 1, 1 when ip_incl, ed_incl and ot_incl are all null
- find_ot_ip_overlaps(df_ip: DataFrame)[source]
Checks for OT claims that have an overlapping IP claim New Column(s):
overlap - 0 or 1, 1 when OT claim has an overlapping IP claim
- Parameters:
df_ip (DataFrame) – IP DataFrame
- flag_dental()[source]
Flag dental claims New Column(s):
dental_TOS - 0 or 1, 1 when MAX_TOS = 9 dental_PRCDR - 0 or 1, 1 when PRCDR_CD starts with ‘D’ dental - 0 or 1, 1 when any of dental_TOS or dental_PRCDR
- flag_em()[source]
Flag claim if procedure code belongs to E/M category New Column(s):
EM - 0 or 1, 1 when PRCDR_CD in [99201, 99215] or [99301, 99350]
- flag_ip_overlaps_and_ed(df_ip: DataFrame)[source]
Adds flags to indicate overlaps with IP claims
- Parameters:
df_ip (pd.DataFrame) – IP claim dataframe
medicaid_utils.preprocessing.max_ps module
This module has MAXPS class which wraps together cleaning/ preprocessing routines specific for MAX PS files
- class medicaid_utils.preprocessing.max_ps.MAXPS(year: int, state: str, data_root: str, index_col: str = 'BENE_MSIS', clean: bool = True, preprocess: bool = True, rural_method: str = 'ruca', tmp_folder: str | None = None, pq_engine: str = 'pyarrow')[source]
Bases:
MAXFile
Scripts to preprocess PS file
- add_eligibility_status_columns() None [source]
Add eligibility columns based on MAX_ELG_CD_MO_{month} values for each month. MAX_ELG_CD_MO:00 = NOT ELIGIBLE, 99 = UNKNOWN ELIGIBILITY => codes to denote ineligibility
- New Column(s):
elg_mon_{month} - 0 or 1 value column, denoting eligibility
for each month - total_elg_mon - No. of eligible months - elg_full_year - 0 or 1 value column, 1 if total_elg_mon = 12 - elg_over_9mon - 0 or 1 value column, 1 if total_elg_mon >= 9 - elg_over_6mon - 0 or 1 value column, 1 if total_elg_mon >= 6 - elg_cont_6mon - 0 or 1 value column, 1 if patient has 6 continuous eligible months - mas_elg_change - 0 or 1 value column, 1 if patient had multiple mas group memberships during claim year - mas_assignments - comma separated list of MAS assignments - boe_assignments - comma separated list of BOE assignments - dominant_boe_group - BOE status held for the most number of months - boe_elg_change - 0 or 1 value column, 1 if patient had multiple boe group memberships during claim year - child_boe_elg_change - 0 or 1 value column, 1 if patient had multiple boe group memberships during claim year - elg_change - 0 or 1 value column, 1 if patient had multiple eligibility group memberships during claim year - eligibility_aged - Eligibility as aged anytime during the claim year - eligibility_child - Eligibility as child anytime during the claim year - max_gap - Maximum gap in enrollment in months - max_cont_enrollment - Maximum duration of continuous enrollment
- flag_common_exclusions()[source]
Adds exclusion flags New Column(s):
excl_duplicated_bene_id - 0 or 1, 1 when bene’s index column
is repeated
- flag_duals()[source]
Flags dual patients New column(s):
dual - 0 or 1 column, 0 if 0 <= EL_MDCR_DUAL_ANN <= 9 for years 2007, 2009, 2011 0 <= EL_MDCR_DUAL_ANN <= 9 for other years
- flag_restricted_benefits()[source]
Checks individual’s eligibility for various medicaid services, based on EL_RSTRCT_BNFT_FLG_{month} values,
1 = full scope; INDIVIDUAL IS ELIGIBLE FOR MEDICAID DURING THE
MONTH AND IS ENTITLED TO THE FULL SCOPE OF MEDICAID BENEFITS. - 2 = alien; INDIVIDUAL IS ELIGIBLE FOR MEDICAID DURING THE MONTH BUT ONLY ENTITLED TO RESTRICTED BENEFITS
BASED ON ALIEN STATUS
3 = dual
4 = pregnancy
5 = other, eg. substance abuse, medically needy
6 = family planning
7 = alternative package of benchmark equivalent coverage,
2011 data had no values of 7 and 8 - 8 = “money follows the person” rebalancing demonstration, 2011 data had no values of 7 and 8 - 9 = unknown - A = Psychiatric residential treatments demonstration - B = Health Opportunity Account - C = CHIP dental coverage, supplemental to employer sponsored insurance - W = Medicaid health insurance premium payment assistance (MA, NJ, VT, OK) - X = rx drug - Y = drug and dual - Z = drug and dual, but Medicaid was not paying for the benefits.
Benefits are non-comprehensive (restricted) when EL_RSTRCT_BNFT_FLG_{month} has any of the below values:
“2”, “3”, “6”: for states other than “AR”, “ID”, “SD”
“2”, “4”, “3”, “6”: for states “AR”, “ID”, “SD”
- New column(s):
any_restricted_benefit_month: 0 or 1, 1 when bene’s benefits
are restricted for atleast 1 month - restricted_benefit_months: Number of restricted benefit months - restricted_benefits: 0 or 1, 1 when number of restricted benefit months are more than the number of number of months the bene was enrolled in medicaid
- flag_rural(method: str = 'ruca')[source]
Classifies benes into rural/ non-rural on the basis of RUCA/ RUCC of their resident ZIP/ FIPS codes
New Columns:
resident_state_cd
rural - 0/ 1/ -1, 1 when bene’s residence is in a rural
location, 0 when not. -1 when zip code is missing - pcsa - resident PCSA code - {ruca_code/ rucc_code} - resident ruca_code
- This function uses
RUCA 3.1 dataset (from https://www.ers.usda.gov/webdocs/DataFiles/53241/RUCA2010zipcode.xlsx?v=8673). RUCA
codes >= 4 denote rural and the rest denote urban as per https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6286055/#SD1 - RUCC codes were downloaded from https://www.ers.usda.gov/webdocs/DataFiles/53251/ruralurbancodes2013.xls?v=2372. RUCC codes >= 8 denote rural and the rest denote urban. - ZCTAs x zipcode crosswalk from UDSMapper (https://udsmapper.org/zip-code-to-zcta-crosswalk/), - zipcodes from multiple sources - istance between centroids of zipcodes using NBER data (https://nber.org/distance/2016/gaz/zcta5/gaz2016zcta5centroid.csv)
- Parameters:
method ({'ruca', 'rucc'}) – Method to use for rural variable construction
- flag_tanf()[source]
The Temporary Assistance for Needy Families (TANF) program provides temporary financial assistance for pregnant women and families with one or more dependent children. This provides financial assistance to help pay for food, shelter, utilities, and expenses other than medical. In MAX files this is identified via
- EL_TANF_CASH_FLG:
1 = INDIVIDUAL DID NOT RECEIVE TANF BENEFITS DURING THE MONTH;
2 = INDIVIDUAL DID RECEIVE TANF BENEFITS DURING THE MONTH. CO
and ID either 0 or 9
- New Column(s):
tanf : 0 or 1, denoting usage of TANF benefits in any of the
months
medicaid_utils.preprocessing.taf_file module
This module has TAFFile class from which is the base class for all TAF file type classes
- class medicaid_utils.preprocessing.taf_file.TAFFile(ftype: str, year: int, state: str, data_root: str, index_col: str = 'BENE_MSIS', clean: bool = True, preprocess: bool = True, tmp_folder: str | None = None, pq_engine: str = 'pyarrow')[source]
Bases:
object
Parent class for all TAF file classes, each of which will have clean and preprocess functions
- cache_results(subtype=None, repartition=False)[source]
Save results in intermediate steps of some lengthy processing. Saving intermediate results speeds up processing, and avoid dask cluster crashes for large datasets
- clean()[source]
Cleaning routines to processes date and gender columns, and add duplicate check flags.
- clean_codes() None [source]
Clean diagnostic code columns by removing non-alphanumeric characters and converting them to upper case and NDC codes columns by removing white space characters and padding 0s to the left so the codes are of length 12
- clean_diag_codes() None [source]
Clean diagnostic code columns by removing non-alphanumeric characters and converting them to upper case
- clean_ndc_codes() None [source]
Clean NDC codes columns by removing white space characters and padding 0s to the left so the codes are of length 12
- clean_proc_codes()[source]
Clean diagnostic code columns by removing non-alphanumeric characters and converting them to upper case
- flag_duplicates()[source]
Removes duplicated rows. TAF claims have multiple versions for each month. This function keeps the most recent file version date for each month using the variables IP_VRSN, LT_VRSN, OT_VRSN, and RX_VRSN. Retains only the claims with maximum value of production data run ID (DA_RUN_ID) for each claim ID (CLM_ID).
References
- flag_ffs_and_encounter_claims()[source]
Flags claims where CLM_TYPE_CD is equal to one of the following values:
1: A FFS Medicaid or Medicaid-expansion claim
3: Medicaid or Medicaid-expanding managed care encounter record
A: Separate CHIP (Title XXI) FFS claim
C: Separate CHIP (Title XXI) encounter record
References
- gather_bene_level_diag_ndc_codes()[source]
Constructs patient level NDC and diagnosis code list columns and saves them to individual files
- classmethod get_claim_instance(claim_type, *args, **kwargs)[source]
Returns an instance of the requested claim type
- process_date_cols()[source]
Convert datetime columns to datetime type and add basic date based constructed variables
New columns:
birth_year, birth_month, birth_day - date components of EL_DOB (date of birth)
birth_date - birth date (EL_DOB)
death - 0 or 1, if DEATH_DT is not empty and falls in the claim year or before
age - age in years, integer format
age_decimal - age in years, with decimals
age_day - age in days
adult - 0 or 1, 1 when patient’s age is in [18,115]
child - 0 or 1, 1 when patient’s age is in [0,17]
- If ftype == ‘ip’:
Clean/ impute admsn_date and add ip duration related columns
New column(s):
admsn_date - Admission date (ADMSN_DT)
srvc_bgn_date - Service begin date (SRVC_BGN_DT)
srvc_end_date - Service end date (SRVC_END_DT)
prncpl_proc_date - Principal procedure date (PRCDR_CD_DT_1)
missing_admsn_date - 0 or 1, 1 when admission date is missing
missing_prncpl_proc_date - 0 or 1, 1 when principal procedure date is missing
flag_admsn_miss - 0 or 1, 1 when admsn_date was imputed
los - ip service duration in days
ageday_admsn - age in days as on admsn_date
age_admsn - age in years, with decimals, as on admsn_date
age_prncpl_proc - age in years as on principal procedure date
age_day_prncpl_proc - age in days as on principal procedure date
- if ftype == ‘ot’:
Adds duration column, provided service end and begin dates are clean
New Column(s):
srvc_bgn_date - Service begin date (SRVC_BGN_DT)
srvc_end_date - Service end date (SRVC_END_DT)
diff & duration - duration of service in days
age_day_srvc_bgn - age in days as on service begin date
age_srvc_bgn - age in years, with decimals, as on service begin date
- if ftype == ‘ps:
New Column(s):
date_of_death - Date of death (DEATH_DT)
medicaid_utils.preprocessing.taf_ip module
This module has TAFIP class which wraps together cleaning/ preprocessing routines specific for TAF IP files
- class medicaid_utils.preprocessing.taf_ip.TAFIP(year: int, state: str, data_root: str, index_col: str = 'BENE_MSIS', clean: bool = True, preprocess: bool = True, tmp_folder: str | None = None, pq_engine: str = 'pyarrow')[source]
Bases:
TAFFile
- clean()[source]
Cleaning routines to clean diagnosis & procedure code columns, processes date and gender columns, and add duplicate check flags.
- flag_common_exclusions()[source]
Adds commonly used IP claim exclusion flag columns. New Columns:
ffs_or_encounter_claim, 0 or 1, 1 when base claim is an FFS or Encounter claim
excl_missing_dob, 0 or 1, 1 when base claim does not have birth date
excl_missing_admsn_date, 0 or 1, 1 when base claim does not have admission date
excl_missing_prncpl_proc_date, 0 or 1, 1 when base claim does not have principal procedure date
medicaid_utils.preprocessing.taf_lt module
This module has TAFLT class which wraps together cleaning/ preprocessing routines specific for TAF LT files
- class medicaid_utils.preprocessing.taf_lt.TAFLT(year: int, state: str, data_root: str, index_col: str = 'BENE_MSIS', clean: bool = True, preprocess: bool = True, tmp_folder: str | None = None, pq_engine: str = 'pyarrow')[source]
Bases:
TAFFile
medicaid_utils.preprocessing.taf_ot module
This module has TAFOT class which wraps together cleaning/ preprocessing routines specific for TAF OT files
- class medicaid_utils.preprocessing.taf_ot.TAFOT(year: int, state: str, data_root: str, index_col: str = 'BENE_MSIS', clean: bool = True, preprocess: bool = True, tmp_folder: str | None = None, pq_engine: str = 'pyarrow')[source]
Bases:
TAFFile
- clean()[source]
Cleaning routines to clean diagnosis & procedure code columns, processes date and gender columns, and add duplicate check flags.
- flag_common_exclusions()[source]
Adds commonly used IP claim exclusion flag columns. New Columns:
ffs_or_encounter_claim, 0 or 1, 1 when base claim is an FFS or Encounter claim
excl_missing_dob, 0 or 1, 1 when base claim does not have birth date
excl_missing_srvc_bgn_date, 0 or 1, 1 when base claim does not have service begin date
medicaid_utils.preprocessing.taf_ps module
This module has TAFPS class which wraps together cleaning/ preprocessing routines specific for TAF PS files
- class medicaid_utils.preprocessing.taf_ps.TAFPS(year: int, state: str, data_root: str, index_col: str = 'BENE_MSIS', clean: bool = True, preprocess: bool = True, rural_method: str = 'ruca', tmp_folder: str | None = None, pq_engine: str = 'pyarrow')[source]
Bases:
TAFFile
Scripts to preprocess PS file
- add_gender()[source]
Adds integer ‘female’ column based on ‘SEX_CD’ column. Undefined values (‘U’) in SEX_CD column will result in female column taking the value -1
- add_mas_boe()[source]
Adds columns denoting number of months in each Maintenance Assistance Status (MAS) and Basis of Eligibility (BOE) category. Columns added are,
boe_chip_months : Number of months in Separate-CHIP BOE category
boe_aged_months : Number of months in Aged BOE category
boe_blind_disabled_months : Number of months in Blind/Disabled BOE category
boe_child_months : Number of months in Children BOE category
boe_adults_months : Number of months in Adult BOE category
boe_breast_and_cervical_cancer_months : Number of months in Breast and Cervical Cancer Prevention and Treatment Act of 2000 BOE category
boe_child_of_unemployed_months : Number of months in Child of Unemployed Adult BOE category
boe_unemployed_months : Number of months in Unemployed Adult BOE category
boe_foster_care_children_months : Number of months in Foster Care Children BOE category
boe_unknown_months : Number of months in Uknown BOE category
mas_chip_months : Number of months in Separate-CHIP MAS category
mas_cash_sec_1931_months : Number of months in Individuals receiving cash assistance or eligible under section 1931 of the Act MAS category
mas_medically_needy_months : Number of months in Medically Needy MAS category
mas_poverty_months : Number of months in Poverty Related Eligibles MAS category
mas_other_months : Number of months in Other Eligibles MAS category
mas_demonstration_months : Number of months in Section 1115 Demonstration expansion eligible MAS category
mas_unknown_months : Number of months in Unknown MAS category
max_mas_type : Top MAS category for the bene
max_boe_type : Top BOE category for the bene
- add_risk_adjustment_scores()[source]
Adds bene level risk adjustment scores. Currently supports Elixhauser scores.
- compute_enrollment_gaps()[source]
Computes enrollment gaps using dates file. Adds number of enrollment gaps and length of maximum enrollment gap in days columns
- flag_common_exclusions()[source]
Adds commonly used exclusion flags
New Column(s):
excl_duplicated_bene_id - 0 or 1, 1 when bene’s index column is repeated
- flag_dual()[source]
Flags benes with DUAL_ELGBL_CD equal to 1 (full dual), 2 (partial dual), or 3 (other dual) in any month are flagged as duals.
References
- flag_ffs_months()[source]
Creates flags for months enrolled in medicaid without enrollment in managed care plans of 3 categories, and adds columns denoting total number of months enrolled in these plans and the enrollment sequence pattern.
- flag_managed_care_months()[source]
Creates flags for 3 categories of managed care plans for each month, and adds columns denoting total number of months enrolled in these plans and the enrollment sequence pattern.
- flag_medicaid_enrolled_months()[source]
Creates flags for medicaid enrollment for each and computes the total number of months enroled in medicaid. Bene has to be enrolled for all days of the month without missing eligibility information for the month to be considered a medicaid enrolled month.
- flag_restricted_benefits()[source]
Flags beneficiaries whose benefits are restricted. Benes with the below values in their RSTRCTD_BNFTS_CD_XX columns are NOT assumed to have restricted benefits:
1. Individual is eligible for Medicaid or CHIP and entitled to the full scope of Medicaid or CHIP benefits.
4. Individual is eligible for Medicaid or CHIP but only entitled to restricted benefits for pregnancy-related services.
5. Individual is eligible for Medicaid or Medicaid-Expansion CHIP but, for reasons other than alien, dual-eligibility or pregnancy-related status, is only entitled to restricted benefits (e.g., restricted benefits based upon substance abuse, medically needy or other criteria).
7. Individual is eligible for Medicaid and entitled to Medicaid benefits under an alternative package of benchmark-equivalent coverage, as enacted by the Deficit Reduction Act of 2005.
Reference: Identifying beneficiaries with a substance use disorder
- flag_rural(method: str = 'ruca')[source]
Classifies benes into rural/ non-rural on the basis of RUCA/ RUCC of their resident ZIP/ FIPS codes
New Columns:
resident_state_cd
rural - 0/ 1/ np.nan, 1 when bene’s residence is in a rural location, 0 when not, -1 when zip code is missing
pcsa - resident PCSA code
{ruca_code/ rucc_code} - resident ruca_code
This function uses
RUCA 3.1 dataset. RUCA codes >= 4 denote rural and the rest denote urban as per Cole, Megan B et al
RUCC codes. RUCC codes >= 8 denote rural and the rest denote urban.
ZCTAs x zipcode crosswalk from UDSMapper.
zipcodes from multiple sources
Distance between centroids of zipcodes using NBER data
- Parameters:
method ({'ruca', 'rucc'}) – Method to use for rural variable construction
- flag_tanf()[source]
The Temporary Assistance for Needy Families (TANF) program provides temporary financial assistance for pregnant women and families with one or more dependent children. This provides financial assistance to help pay for food, shelter, utilities, and expenses other than medical. In TAF files this is identified via
- TANF_CASH_CD:
1: INDIVIDUAL DID NOT RECEIVE TANF BENEFITS DURING THE YEAR;
2: INDIVIDUAL DID RECEIVE TANF BENEFITS DURING THE YEAR
medicaid_utils.preprocessing.taf_rx module
This module has TAFRX class which wraps together cleaning/ preprocessing routines specific for TAF Pharmacy files