Why we need more professionals with technical skills in the HIM Industry

Photo by Luca Bravo on Unsplash

Back at the end of August I enrolled into a bootcamp through Practicum By Yandex focused on Data Analysis. At the time, my thought was to develop skills to make me a better analyst. I knew that the experience would leave a lasting impression and give me skills that my HIM background and education didn’t provide me. I have always been intrigued with anything tech related. I realized that the RHIA program gave me an overall generalization of many different technologies used in HIM today, more advanced applications would have been helpful to apply in healthcare.

It took me a solid 7 months to complete this program. We had access to a tutor through a chat program called Slack and also access to other students. Although the curriculum was focused on data, the curriculum did not place a large emphasis on math/ statistics or machine learning concepts that their Data Science path taught. Despite this, I felt that the program did provide me an imperative overview of statistics commonly used in Data Analytics. My goal in pursuing this bootcamp was to gain additional skills while working towards the CHDA certification through AHIMA this year. The language that was used in the curriculum was a programming language called Python, where we used Jupyter Notebook to provide insight and query the and tell its story. The insight was derived by “Exploring” the data or performing EDA (Exploratory Data Analysis) through typing commands in the notebook. We also did some Querying through a program called SQL or Structured Query Language and also created some Tableau dashboards

Here is a dataset that was a subset of the inpatient charge data from FY 2017. If you want to do some analysis on your own of this dataset feel free to check out here

In this data we looked at patients that were reported as having Congestive Heart Failure. First I looked at the mean length of stay for the male vs female population of the dataset to gain some additional insight on population differences between genders.

#mean LOS by gender
CHFgenders = df.groupby('Gender Description', as_index=False).agg({'LOS' : 'mean'})

barplot = CHFgenders.plot(x='Gender Description', kind='bar')
plt.title('Average LOS by Gender')
plt.ylabel('LOS', fontsize=12)
plt.xlabel('Gender', fontsize=12)

We see that the mean length of stay between males and females is pretty similar with female populating being slightly higher

Then I evaluated the mean DRG per gender to evaluate any trends in the top DRG’s that were reported in the data set.

#mean count of patients with the drg by gender
DRG_by_gender = df.groupby(['drgcode', 'Gender Description']).mean()['Patient'].reset_index()

plt.figure(figsize=(10, 7))
sns.barplot(data=DRG_by_gender, x='drgcode', y='Patient', hue='Gender Description')
plt.title('Average DRG by Gender', size=16)
plt.ylabel('Patient', fontsize=14)
plt.xlabel('DRG', fontsize=14)

This data demonstrates that the top DRG for females is DRG 292-: HEART FAILURE AND SHOCK WITH COMPLICATION OR COMORBIDITY (CC). The top DRG for Males for this data set is 291-: HEART FAILURE AND SHOCK WITH MAJOR COMPLICATION OR COMORBIDITY (MCC). I found this a bit interesting that the female population demonstrated that the severity was less for females compared to males.

I then did some additional evaluation comparing the average discharge by gender to see the trends were with males compared to female population in this data set

# discharge by gender pivot table
discharge_by_gender = df.groupby(['Discharge Destination Description', 'Gender Description']).mean()['Patient'].reset_index()

#discharge by gender barchart
plt.figure(figsize=(10, 7))
sns.barplot(data=discharge_by_gender, x='Discharge Destination Description', y='Patient', hue='Gender Description')
plt.title('Average Discharge by Gender', size=16)
plt.ylabel('Patient', fontsize=14)
plt.xlabel('Discharge Description', fontsize=14)

I then split out the male and female populations from this dataset and provided separate pie charts that demonstrate this information in a more granular level by gender.

dischargeByfemale = df[df['Gender Description']=='Female'].groupby('Discharge Destination Description', as_index=False).agg({'Patient' : pd.Series.nunique})

dischargeByfemale.groupby(['Discharge Destination Description']).sum().plot(kind='pie', y='Patient',startangle=45,
figsize=(15,15), autopct='%1.1f%%')
plt.title('Discharge Description')
# Finally showing the plot 
top discharge y female if discharged to home care and self care
dischargeBymale = df[df['Gender Description']=='Male'].groupby('Discharge Destination Description', as_index=False).agg({'Patient' : pd.Series.nunique})

dischargeBymale.groupby(['Discharge Destination Description']).sum().plot(kind='pie', y='Patient',startangle=45,
figsize=(15,15), autopct='%1.1f%%')
plt.title('Discharge Description')
# Finally showing the plot 
top discharge for male is discharged to SNF and self care

Lets do some hypothesis testing to evaluate this information. When we review the Mean length of stay we see that the length of stay for females is shorter than males. However, when we do a hypothesis test, we see that the alternative hypothesis states that the count of patients with certain DRG’s is different from males to females. The data above supports this. Lets look at some hypothesis testing to further evaluate this information.

t -test comparing males to females

However when we do a more detailed hypothesis test comparing females to males for each DRG we see different results.

First we have to break out the datasets by DRG

#DRG breakdown of dataset. 
DRG291 =  DRG_by_gender.query('drgcode == 291')
DRG292 =  DRG_by_gender.query('drgcode == 292') 
DRG293 =  DRG_by_gender.query('drgcode == 293') 
#Check whether the difference between the groups is statistically significant for DRG 291.  
import scipy.stats as stats

print("{0:.3f}".format(stats.mannwhitneyu(DRG291[DRG291['Gender Description']=="Female"]['Patient'], DRG291[DRG291['Gender Description']=="Male"]['Patient'])[1]))
print("{0:.3f}".format(DRG291[DRG291['Gender Description']=="Female"]['Patient'].mean()/DRG291[DRG291['Gender Description']=="Male"]['Patient'].mean()-1)) 

alpha = .05 #significance level

results2 = st.mannwhitneyu(DRG291[DRG291['Gender Description']=="Female"]['Patient'], DRG291[DRG291['Gender Description']=="Male"]['Patient'])

print('p-value: ', results2.pvalue)

if (results2.pvalue < alpha):
    print("H1 (the alternative hypothesis): there is a statistically significant difference in DRG 291 in average population of females and males")
    print("H0 (the null hypothesis): there's not a statistically significant difference in DRG 291 in average population of females and males")
Results of hypothesis test for DRG 291

Followed the same process with DRG 292 and 292 and came up with the same result, that there is not a statistically significant difference for the average population of females to males. So, although my initial review demonstrated a statistically significant difference in the population of females to males, when we looked at the individual DRG’s for the population of females to males, there was not a statistically significant difference. You can find the github repo of this project please click here.

I would not have been able to attenuate these correlations without my bootcamp experience. Although it was extremely challenging, the experience really made me see how tech skills really are needed more in the HIM profession. The curriculum in Bootcamp, was very different compared to my HIM degree but it challenged me to think of how we can solve the problems we encounter in healthcare by sifting through data. I hope you enjoyed this example of how I used Python and its libraries to perform exploratory data analysis with public data sets available. I challenge future HIM professionals to evaluate how gaining additional tech skills can provide an opportunity for you to excel as an HIM professional.