medicaid_utils.other_datasets package
Submodules
medicaid_utils.other_datasets.fqhc module
medicaid_utils.other_datasets.hcris module
- medicaid_utils.other_datasets.hcris.clean_address(address)[source]
Cleaning address columns by removing placenames in street address lines and ” “concatenating multiple occupancy identifier/ street numbers
- medicaid_utils.other_datasets.hcris.combine_and_clean_hcris_files(logger_name)[source]
Combines HCRIS provider info files and standardizes column names Provider id is cleaned by removing non-alphanumeric columns. Additional version of provider id without leading zeroes is created. Files are downloaded from multiple years, starting from 2014 when archives are available. 2014: https://web.archive.org/web/20141120072219/http://downloads.cms.gov/Files/hcris/HCLINIC-REPORTS.zip 2015: https://web.archive.org/web/20151230005431/http://downloads.cms.gov/Files/hcris/HCLINIC-REPORTS.zip 2016: https://web.archive.org/web/20161228145429/http://downloads.cms.gov/Files/hcris/HCLINIC-REPORTS.zip 2017: https://web.archive.org/web/20171220152135/http://downloads.cms.gov/Files/hcris/HCLINIC-REPORTS.zip 2018: https://web.archive.org/web/20181226170727/http://downloads.cms.gov/Files/hcris/HCLINIC-REPORTS.zip 2019: https://web.archive.org/web/20191226132614/http://downloads.cms.gov/Files/hcris/FQHC14-REPORTS.zip 2020: https://web.archive.org/web/20201225141117/https://downloads.cms.gov/Files/hcris/FQHC14-REPORTS.zip 2021: https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/Cost-Reports/FQHC-224-2014-form
- medicaid_utils.other_datasets.hcris.compute_basic_match_purity(pdf)[source]
Computes levenshtein distance based similarity scores for address, name & city, and equality match for state
- medicaid_utils.other_datasets.hcris.compute_match_purity(pdf)[source]
Computes levenshtein distance based similarity scores for address & name columns
- medicaid_utils.other_datasets.hcris.create_npi_fqhc_crosswalk(source: str = 'hcris', logger_name: str = 'fqhc_crosswalk')[source]
Merge HCRIS, UDS and NPPES and datasets to create FQHC NPI list
- medicaid_utils.other_datasets.hcris.filter_partial_matches(df)[source]
Filter source x nppes matches that have some similarity between their names and address, with a distance score of at least 60
- medicaid_utils.other_datasets.hcris.fuzzy_match(df, df_npi_provider, source='hcris', logger_name='fuzzy_match')[source]
- medicaid_utils.other_datasets.hcris.nppes_api_requests(pdf, logger_name, **dct_request_params)[source]
Make NPPES API calls using hclinic name, city, state & zipcode in the request
- medicaid_utils.other_datasets.hcris.process_address_columns(pdf, logger_name, source='hcris')[source]
Combines street level components; concatenates multiple values belonging to the same address component type, such as multiple street or occupancy identifier numbers
medicaid_utils.other_datasets.npi module
- medicaid_utils.other_datasets.npi.cleanup_raw_npi_files(lst_year: List[int], pq_engine: str = 'fastparquet') None [source]
Cleans up raw NPPES files for a list of years by standardizing column names, generating long format mappings from the flat raw file and creating NPI mappings for a known set of taxonomies :param lst_year: :param pq_engine: Parquet engine, fastparquet or pyarrow :return:
- medicaid_utils.other_datasets.npi.generate_npi_taxonomy_mappings(year: int, pq_engine: str = 'fastparquet') None [source]
Generates taxonomy type x NPI mappings for the set of taxonomy types in taxonomy codes lookup file (taxonomy_codes.csv) :param year: :param pq_engine: Parquet engine, fastparquet or pyarrow :return:
- medicaid_utils.other_datasets.npi.generate_oscar_fqhc_npis(lst_year, pq_engine='fastparquet')[source]
Saves list of NPIs with FQHC range oscar provider ids into a pickle file
- medicaid_utils.other_datasets.npi.wide_to_long_nppes_taxonomy_file(year: int, pq_engine: str) None [source]
Creates long format provider and taxonomy mapping datasets from NPI file. Two versions of provider mapping files are created, one of which has the provider ids cleaned by removing non alphanumeric characters :param year: year :param pq_engine: Parquet engine, fastparquet or pyarrow :return: None
medicaid_utils.other_datasets.uds module
- medicaid_utils.other_datasets.uds.clean_address(address)[source]
Cleaning address columns by removing placenames in street address lines and ” “concatenating multiple occupancy identifier/ street numbers
- medicaid_utils.other_datasets.uds.clean_hclinic_name(hclinic_name)[source]
Clean HCLINIC names by removing special characters and extra spaces
- medicaid_utils.other_datasets.uds.combine_and_clean_uds_files(lst_year=None, logger_name='uds')[source]
Standardize UDS column names and flatten files
- medicaid_utils.other_datasets.uds.combine_uds_fqhc_npi_crosswalks(lst_year, logger_name, source='uds')[source]
Combine UDS x NPPES matches from all years
- medicaid_utils.other_datasets.uds.compute_basic_match_purity(pdf)[source]
Computes levenshtein distance based similarity scores for address, name & city, and equality match for state
- medicaid_utils.other_datasets.uds.compute_door_no_info(pdf)[source]
Parse address components and get door number info
- medicaid_utils.other_datasets.uds.compute_match_purity(pdf)[source]
Computes levenshtein distance based similarity scores for address & name columns
- medicaid_utils.other_datasets.uds.create_npi_fqhc_crosswalk(source: str = 'hcris', year: int | None = None, logger_name: str = 'fqhc_crosswalk')[source]
Merge HCRIS, UDS and NPPES and datasets to create FQHC NPI list
- medicaid_utils.other_datasets.uds.filter_partial_matches(df, less_constrained=False)[source]
Filter source x nppes matches that have some similarity between their names and address, with a distance score of at least 60
- medicaid_utils.other_datasets.uds.fuzzy_match(df, df_npi_provider, source='hcris', logger_name='fuzzy_match', export_folder='/Users/Manu_1/Documents/pyWorkspace/medicaid-utils/medicaid_utils/other_datasets/data/fqhc', chunk_size=300, less_constrained=False)[source]
Perform string distance based matches between HCRIS, UDS and NPPES datasets. Create two sets of matches: perfect matches that meet strict criteria with respect to name & address similarity, and less perfect matches with more relaxed criteria.
- medicaid_utils.other_datasets.uds.generate_and_combine_fqhc_npi_crosswalks(logger_name)[source]
Get NPIs for FQHCs in HCRIS, UDS datasets. Combine this with NPIs with FQHC taxonomy/ FQHC range CCN in NPPES dataset. Merge UDS with all the identified NPIs to generate BHMISID x FQHC crosswalk
- medicaid_utils.other_datasets.uds.generate_bhcmisid_fqhc_crosswalk(lst_perfect_npi, lst_fuzzy_npi, logger_name)[source]
Map all identified NPIs to BHCMISID to filter FQHCs by years they were in FQHC status
- medicaid_utils.other_datasets.uds.generate_oscar_fqhc_npis(lst_year=None, pq_engine='fastparquet')[source]
Saves list of NPIs with FQHC range oscar provider ids into a pickle file
- medicaid_utils.other_datasets.uds.nppes_api_requests(pdf, source, logger_name, **dct_request_params)[source]
Make NPPES API calls using hclinic name, city, state & zipcode in the request
- medicaid_utils.other_datasets.uds.process_address_columns(pdf, logger_name, source='hcris')[source]
Combines street level components; concatenates multiple values belonging to the same address component type, such as multiple street or occupancy identifier numbers