medicaid_utils.filters.claims package

Submodules

medicaid_utils.filters.claims.dx_and_proc module

This module has functions to add diagnosis/ procedure code based indicator flags to claims

medicaid_utils.filters.claims.dx_and_proc.flag_diagnoses_and_procedures(dct_diag_codes: dict, dct_proc_codes: dict, df_claims: DataFrame, cms_format: str = 'MAX', lst_claim_diag_col: List[str] = None, dct_column_values: dict | None = None) DataFrame[source]

Flags claims based on diagnosis/ procedure codes

Parameters:
  • dct_diag_codes (dict) –

    Dictionary of diagnosis codes. Should be in the format

    {condition_name: {['incl' / 'excl']: {[9/ 10]: list of codes} }
    

    Eg:

    {'oud_nqf': {'incl': {9: ['3040','3055']}}}
    

  • dct_proc_codes (dict) –

    Dictionary of procedure codes. Should be in the format

    {procedure_name: {procedure_system_code: list of codes} }
    

    Eg:

    {'methadone':
        {6: 'HZ81ZZZ,HZ84ZZZ,HZ85ZZZ, HZ86ZZZ,
             HZ91ZZZ,HZ94ZZZ,HZ95ZZZ,'HZ96ZZZ'.split(",")}
    }
    

  • df_claims (dd.DataFrame) – Claims dataframe

  • cms_format ({'MAX', TAF'}) – CMS file format.

  • lst_claim_diag_col (List[str], optional) – List of diagnosis column names

  • dct_column_values (dict) –

    Dictionary of column names and numerical values that should be used to flag conditions and procedures. Should be in the format

    {condn_procedure_name:
        {column_name: list of numerical values} }
    

    Eg:

    {'dx_delivery':
        {'RCPNT_DLVRY_CD': [1]}
    }
    

Return type:

dd.DataFrame

Raises:

ValueError – If non-alphanumeric columns are present in ICD/ CPT procedure codes in dct_diag_codes/ dct_proc_codes

Examples

>>> import pandas as pd
>>> import dask.dataframe as dd
>>> pdf = pd.DataFrame({
...     'MSIS_ID': ['A', 'B'],
...     'DIAG_CD_1': ['3040', '250'],
...     'DIAG_CD_2': ['', '3055'],
...     'service_date': pd.to_datetime(['2020-01-01', '2020-02-01']),
... }).set_index('MSIS_ID')
>>> ddf = dd.from_pandas(pdf, npartitions=1)
>>> dct_diag = {'oud': {'incl': {9: ['3040', '3055']}}}
>>> result = flag_diagnoses_and_procedures(dct_diag, {}, ddf, cms_format='MAX')
>>> result.compute()['diag_oud'].tolist()
[1, 1]
medicaid_utils.filters.claims.dx_and_proc.get_patient_ids_with_conditions(dct_diag_codes: dict, dct_proc_codes: dict, logger_name: str = '/home/runner/work/medicaid-utils/medicaid-utils/medicaid_utils/filters/claims/dx_and_proc.py', cms_format: str = 'MAX', dct_column_values: dict | None = None, **dct_claims: DataFrame) Tuple[DataFrame, Dict[str, DataFrame]][source]

Gets patient ids with conditions denoted by provided diagnosis codes or procedure codes

Parameters:
  • dct_diag_codes (dict) –

    Dictionary of diagnosis codes. Should be in the format

    {condition_name:
        {['incl' / 'excl']: {[9/ 10]: list of codes} }
    

    Eg:

    {'oud_nqf': {'incl': {9: ['3040','3055']}}}
    

  • dct_proc_codes (dict) –

    Dictionary of procedure codes. Should be in the format

    {procedure_name:
        {procedure_system_code: list of codes} }
    

    Eg:

    {'methadone':
        {6: 'HZ81ZZZ,HZ84ZZZ,HZ85ZZZ, HZ86ZZZ, HZ91ZZZ,HZ94ZZZ,
             HZ95ZZZ,HZ96ZZZ'.split(",")}
    }
    

  • logger_name (str) – Logger name

  • cms_format ({'MAX', TAF'}) – CMS file format.

  • dct_column_values (dict) –

    Dictionary of column names and value that should be used to flag conditions and procedures. Should be in the format

    {condn_procedure_name:
        {column_name: list of values} }
    

    Eg:

    {'dx_delivery':
        {'RCPNT_DLVRY_CD': [1]}
    }
    

  • **dct_claims (dict) –

    Keyword arguments of claim dataframes. Should be in the format:

    {file_type: dask.dataframe}

Return type:

Tuple(pd.DataFrame, dict)

Raises:

IndexError – If the input claim datasets do not have the same index name

Examples

>>> import pandas as pd
>>> import dask.dataframe as dd
>>> pdf = pd.DataFrame({
...     'MSIS_ID': ['A', 'B', 'A'],
...     'DIAG_CD_1': ['3040', '250', '3055'],
...     'service_date': pd.to_datetime(['2020-01-01', '2020-02-01', '2020-03-01']),
... }).set_index('MSIS_ID')
>>> ddf = dd.from_pandas(pdf, npartitions=1)
>>> dct_diag = {'oud': {'incl': {9: ['3040', '3055']}}}
>>> pdf_ids, dct_stats = get_patient_ids_with_conditions(
...     dct_diag, {}, cms_format='MAX', ip=ddf)
>>> 'ip_diag_oud' in pdf_ids.columns
True

medicaid_utils.filters.claims.rx module

This module has functions to add NDC code based indicator flags to claims

medicaid_utils.filters.claims.rx.flag_prescriptions(dct_ndc_codes: dict, df_claims: DataFrame, ignore_missing_days_supply: bool = False) DataFrame[source]

Flags claims based on NDC codes

Parameters:
  • dct_ndc_codes (dict) –

    Dictionary of NDC. Should be in the format

    {condition_name: list of codes}
    

    Eg:

    {'buprenorphine': ['00378451905', '00378451993', '00378617005',
                       '00378617077']}
    

  • df_claims (dd.DataFrame) – Claims dataframe

  • ignore_missing_days_supply (bool, default=False) – Always flag claims with missing, negative or 0 days of supply as 0

Return type:

dd.DataFrame

Examples

>>> import pandas as pd
>>> import dask.dataframe as dd
>>> pdf = pd.DataFrame({
...     'MSIS_ID': ['A', 'B', 'C'],
...     'NDC': ['00378451905', '99999999999', '00378617005'],
...     'DAYS_SUPPLY': ['30', '10', '0'],
... }).set_index('MSIS_ID')
>>> ddf = dd.from_pandas(pdf, npartitions=1)
>>> dct_ndc = {'buprenorphine': ['00378451905', '00378617005']}
>>> result = flag_prescriptions(dct_ndc, ddf)
>>> result.compute()['rx_buprenorphine'].tolist()
[1, 0, 0]

When ignore_missing_days_supply is True, claims with zero or missing days of supply are still flagged:

>>> result2 = flag_prescriptions(dct_ndc, ddf, ignore_missing_days_supply=True)
>>> result2.compute()['rx_buprenorphine'].tolist()
[1, 0, 1]