medicaid_utils.other_datasets package

Submodules

medicaid_utils.other_datasets.fqhc module

medicaid_utils.other_datasets.fqhc.generate_oscar_fqhc_npis(lst_year=None, pq_engine='fastparquet')[source]: Saves list of NPIs with FQHC range oscar provider ids into a pickle file

medicaid_utils.other_datasets.fqhc.get_file_name_dict(source)[source]

medicaid_utils.other_datasets.fqhc.get_fqhc_crosswalk(start_year, data_folder='/Users/Manu_1/Documents/pyWorkspace/medicaid-utils/medicaid_utils/other_datasets/data/lookups/fqhc')[source]: Returns FQHC cross walk with FQHC NPI’s seen in UDS datasets till the start_year

medicaid_utils.other_datasets.hcris module

medicaid_utils.other_datasets.hcris.clean_address(address)[source]: Cleaning address columns by removing placenames in street address lines and ” “concatenating multiple occupancy identifier/ street numbers

medicaid_utils.other_datasets.hcris.clean_hclinic_name(hclinic_name)[source]

medicaid_utils.other_datasets.hcris.clean_zip(zipcode)[source]

medicaid_utils.other_datasets.hcris.combine_and_clean_hcris_files(logger_name)[source]: Combines HCRIS provider info files and standardizes column names Provider id is cleaned by removing non-alphanumeric columns. Additional version of provider id without leading zeroes is created. Files are downloaded from multiple years, starting from 2014 when archives are available. 2014: https://web.archive.org/web/20141120072219/http://downloads.cms.gov/Files/hcris/HCLINIC-REPORTS.zip 2015: https://web.archive.org/web/20151230005431/http://downloads.cms.gov/Files/hcris/HCLINIC-REPORTS.zip 2016: https://web.archive.org/web/20161228145429/http://downloads.cms.gov/Files/hcris/HCLINIC-REPORTS.zip 2017: https://web.archive.org/web/20171220152135/http://downloads.cms.gov/Files/hcris/HCLINIC-REPORTS.zip 2018: https://web.archive.org/web/20181226170727/http://downloads.cms.gov/Files/hcris/HCLINIC-REPORTS.zip 2019: https://web.archive.org/web/20191226132614/http://downloads.cms.gov/Files/hcris/FQHC14-REPORTS.zip 2020: https://web.archive.org/web/20201225141117/https://downloads.cms.gov/Files/hcris/FQHC14-REPORTS.zip 2021: https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/Cost-Reports/FQHC-224-2014-form

medicaid_utils.other_datasets.hcris.compute_basic_match_purity(pdf)[source]: Computes levenshtein distance based similarity scores for address, name & city, and equality match for state

medicaid_utils.other_datasets.hcris.compute_door_no_info(pdf)[source]

medicaid_utils.other_datasets.hcris.compute_match_purity(pdf)[source]: Computes levenshtein distance based similarity scores for address & name columns

medicaid_utils.other_datasets.hcris.create_npi_fqhc_crosswalk(source: str = 'hcris', logger_name: str = 'fqhc_crosswalk')[source]: Merge HCRIS, UDS and NPPES and datasets to create FQHC NPI list

medicaid_utils.other_datasets.hcris.filter_partial_matches(df)[source]: Filter source x nppes matches that have some similarity between their names and address, with a distance score of at least 60

medicaid_utils.other_datasets.hcris.flatten_nppes_query_result(pdf)[source]

medicaid_utils.other_datasets.hcris.fuzzy_match(df, df_npi_provider, source='hcris', logger_name='fuzzy_match')[source]

medicaid_utils.other_datasets.hcris.get_taxonomies(lstdct_tax)[source]

medicaid_utils.other_datasets.hcris.nppes_api_requests(pdf, logger_name, **dct_request_params)[source]: Make NPPES API calls using hclinic name, city, state & zipcode in the request

medicaid_utils.other_datasets.hcris.process_address_columns(pdf, logger_name, source='hcris')[source]: Combines street level components; concatenates multiple values belonging to the same address component type, such as multiple street or occupancy identifier numbers

medicaid_utils.other_datasets.hcris.standardize_addresses(pdf)[source]: Standardize address using USPS API

medicaid_utils.other_datasets.hcris.street_address(address)[source]: Strips placenames, state and city names and returns street address component

medicaid_utils.other_datasets.npi module

medicaid_utils.other_datasets.npi.cleanup_raw_npi_files(lst_year: List[int], pq_engine: str = 'fastparquet') → None[source]: Cleans up raw NPPES files for a list of years by standardizing column names, generating long format mappings from the flat raw file and creating NPI mappings for a known set of taxonomies :param lst_year: :param pq_engine: Parquet engine, fastparquet or pyarrow :return:

medicaid_utils.other_datasets.npi.generate_npi_taxonomy_mappings(year: int, pq_engine: str = 'fastparquet') → None[source]: Generates taxonomy type x NPI mappings for the set of taxonomy types in taxonomy codes lookup file (taxonomy_codes.csv) :param year: :param pq_engine: Parquet engine, fastparquet or pyarrow :return:

medicaid_utils.other_datasets.npi.generate_oscar_fqhc_npis(lst_year, pq_engine='fastparquet')[source]: Saves list of NPIs with FQHC range oscar provider ids into a pickle file

medicaid_utils.other_datasets.npi.wide_to_long_nppes_taxonomy_file(year: int, pq_engine: str) → None[source]: Creates long format provider and taxonomy mapping datasets from NPI file. Two versions of provider mapping files are created, one of which has the provider ids cleaned by removing non alphanumeric characters :param year: year :param pq_engine: Parquet engine, fastparquet or pyarrow :return: None

medicaid_utils.other_datasets.uds module

medicaid_utils.other_datasets.uds.clean_address(address)[source]: Cleaning address columns by removing placenames in street address lines and ” “concatenating multiple occupancy identifier/ street numbers

medicaid_utils.other_datasets.uds.clean_hclinic_name(hclinic_name)[source]: Clean HCLINIC names by removing special characters and extra spaces

medicaid_utils.other_datasets.uds.clean_zip(zipcode)[source]: Remove hyphens in zipcodes

medicaid_utils.other_datasets.uds.combine_and_clean_uds_files(lst_year=None, logger_name='uds')[source]: Standardize UDS column names and flatten files

medicaid_utils.other_datasets.uds.combine_uds_fqhc_npi_crosswalks(lst_year, logger_name, source='uds')[source]: Combine UDS x NPPES matches from all years

medicaid_utils.other_datasets.uds.compute_basic_match_purity(pdf)[source]: Computes levenshtein distance based similarity scores for address, name & city, and equality match for state

medicaid_utils.other_datasets.uds.compute_door_no_info(pdf)[source]: Parse address components and get door number info

medicaid_utils.other_datasets.uds.compute_match_purity(pdf)[source]: Computes levenshtein distance based similarity scores for address & name columns

medicaid_utils.other_datasets.uds.create_npi_fqhc_crosswalk(source: str = 'hcris', year: int | None = None, logger_name: str = 'fqhc_crosswalk')[source]: Merge HCRIS, UDS and NPPES and datasets to create FQHC NPI list

medicaid_utils.other_datasets.uds.filter_partial_matches(df, less_constrained=False)[source]: Filter source x nppes matches that have some similarity between their names and address, with a distance score of at least 60

medicaid_utils.other_datasets.uds.flatten_nppes_query_result(pdf)[source]

medicaid_utils.other_datasets.uds.fuzzy_match(df, df_npi_provider, source='hcris', logger_name='fuzzy_match', export_folder='/Users/Manu_1/Documents/pyWorkspace/medicaid-utils/medicaid_utils/other_datasets/data/fqhc', chunk_size=300, less_constrained=False)[source]: Perform string distance based matches between HCRIS, UDS and NPPES datasets. Create two sets of matches: perfect matches that meet strict criteria with respect to name & address similarity, and less perfect matches with more relaxed criteria.

medicaid_utils.other_datasets.uds.generate_and_combine_fqhc_npi_crosswalks(logger_name)[source]: Get NPIs for FQHCs in HCRIS, UDS datasets. Combine this with NPIs with FQHC taxonomy/ FQHC range CCN in NPPES dataset. Merge UDS with all the identified NPIs to generate BHMISID x FQHC crosswalk

medicaid_utils.other_datasets.uds.generate_bhcmisid_fqhc_crosswalk(lst_perfect_npi, lst_fuzzy_npi, logger_name)[source]: Map all identified NPIs to BHCMISID to filter FQHCs by years they were in FQHC status

medicaid_utils.other_datasets.uds.generate_oscar_fqhc_npis(lst_year=None, pq_engine='fastparquet')[source]: Saves list of NPIs with FQHC range oscar provider ids into a pickle file

medicaid_utils.other_datasets.uds.get_file_name_dict(source)[source]

medicaid_utils.other_datasets.uds.get_taxonomies(lstdct_tax)[source]: Concat taxonomies

medicaid_utils.other_datasets.uds.nppes_api_requests(pdf, source, logger_name, **dct_request_params)[source]: Make NPPES API calls using hclinic name, city, state & zipcode in the request

medicaid_utils.other_datasets.uds.process_address_columns(pdf, logger_name, source='hcris')[source]: Combines street level components; concatenates multiple values belonging to the same address component type, such as multiple street or occupancy identifier numbers

medicaid_utils.other_datasets.uds.standardize_addresses(pdf)[source]: Standardize address using USPS API

medicaid_utils.other_datasets.uds.street_address(address)[source]: Strips placenames, state and city names and returns street address component

medicaid_utils.other_datasets.zip module

medicaid_utils.other_datasets.zip.generate_zip_pcsa_ruca_crosswalk()[source]: Generate zip x pcsa x ruca crosswalk

medicaid_utils.other_datasets.zip.pool_zipcode_pcsa_datasets()[source]: Combine multiple sources of zipcode & PCSA datasets