The objective of my internship at arinti was to research different AI techniques to personalize PRO-CTCAE questionnaires for lung cancer patients.
PRO-CTCAE is a patient-reported outcome (PRO) measurement system, that was developed to evaluate symptomatic toxicity in patients on cancer clinical trials. It was designed to be used as a companion to the Common Terminology Criteria for Adverse Events (CTCAE), the standard lexicon for adverse event reporting in cancer clinical trials. The PRO-CTCAE Item Library (PDF, 216 KB) includes 124 items representing 78 symptomatic toxicities drawn from the CTCAE.
PRO-CTCAE items evaluate the symptom attributes of frequency, severity, interference, amount, presence/absence. Each symptomatic AE is assessed by 1-3 attributes. The questionnaires are sent to the patients on a weekly basis, during key periods in the medical trial (eg. first two cycles in an early phase trial) or at other crucial clinical assessment timeframes, based upon knowledge of the anticipated toxicity profile of the treatment.
PRO-CTCAE responses are scored from 0 to 4 (or 0/1 for absent/present) which yield up to three patient-reported scores per symptomatic toxicity. There are no standardized scoring rules for the moment for how to combine attributes into a single score or how best to analyze PRO-CTCAE data longitudinally. Scores for each attribute (frequency, severity and/or interference) are presented descriptively (e.g. summary statistics or graphical presentations).
About the database
The database for my internship comprises only 35 lung cancer patients. Each patient has a unique patient ID and fills in the questionnaire on weekly basis (range from 1 week to 46 weeks). The patients answered 18 questions out the 80 questions in PRO-CTCAE, as selected by the medical and nursing staff of the hospital that helped us perform this internship project. As mentioned earlier each question relates to a specific body part and assesses the magnitude of the symptom (frequency, severity, and interference). A composite score (ranging from 0 to 4) was calculated for each question. In addition, there are columns for tumor classification (this assess the size of tumor in each patient for each corresponding week that the patient fills in the questionnaire), a World Health Organisation (WHO) score (measure of independence in activities of daily life), age (measured in an interval of 5 years). In summary, it is a data frame with 544 rows × 55 columns.
Added value of the internship
Lung cancer is a life threatening disease the has a major impact on patients life and requires a robust treatment in the form of either chemotherapy or radiotherapy or sometimes both. These therapies improves the quality of life of patients with lung cancer but unfortunately also have accompanying adverse effects, which requires continuous monitoring. That is why PRO-CTCAE is used on weekly basis. But PRO-CTCAE has 80 domains which the patients needs to report on every week. Considering the fact that these patients are often elders with different biological composition, a one-size fits all approach to select the questions that need to be answered (as mentioned above) might not be an appropriate approach. Hence, the added value of this internship is that the PRO-CTCAE questionnaire is personalized, using state-of-the-art AI techniques. This helps in achieving an individualized approach and saves time for filling in the questionnaires for the patient, considering he/she only receives questions which are relevant for him/her.
The goal of my internship was to create a system that would ask each patient the top 10 most important questions, which are the “least known”.
I used the following methodology in my project:
- Goal: Maximal importance and uncertainty of recommended questions
That is, we prefer to ask questions that matter most and know least.
- Exploratory Data Analysis
- Clustering of patients: by their tumor_class and WHO_score
- Computing the importance of questions: measured by the frequency a question has been asked within the same cluster of patients
- Computing the uncertainty of questions:
-The standard deviation of answers by patients within the same cluster
-The standard deviation of answers by the same patient in different weeks
- Added feature: question categories (by body parts)
- Feature engineering of previous answers
- The weighting of the same questions answered by the same patient in the past weeks: can you measure the decay of certainty?
- The weighting of the same questions answered by other patients within the same cluster of patients
- Computing personalized recommendations for each patient
As a first step, I performed some exploratory analysis on my dataset. For example, by visualizing some of the collinear features with the patients age group.
Something I found for age for example, when I ranked the patients in intervals of 5 years, the dominant groups are clearly visible. When a box plot was drawn it showed clearly that age group “35 years” is an outlier. We already know that cancer is a chronic disease predominantly found for patients with advanced age, but can now also confirm this for our dataset.
Data cleaning & patient clustering
The dataframe is 544 X 55 and contains categorical variables and tumor classifications in strings. All features were categorical variables except patient ID and number of weeks for which a patient fills in the questionnaire. We chose to use K-means for the clustering, hence we have to pre-process the data further and convert tumor classification into numeric type before we proceed to the actual clustering. There were some missing values in questions concerning severity of some symptoms and these were replaced with zero. This is because after the exploratory data analysis it was understood that such questions had a score of zero in their preceding column (asking about the frequency), which means the patient answered “never” to experience these symptoms.
For the clustering of patients I looked at two methods (K-means and K-modes). Both were experimented with in this phase of the project. For the K-modes method, we chose to experiment with k-modes owing to the fact that our data frame contains categorical variables. First, we used one-hot encoder to create dummy variables. Second, we used elbow method to find a number of clusters in our dataset. However, no apparent cluster was found from the elbow method.
We have proven that although, we have a dataset that contains categorical variables, K-modes was not the appropriate algorithm, because there is no natural ordering in the data set.
In order to make K-means possible on the data set, we defined a mapping for tumor classification into numerical values, while for WHO_score it was already reordered in numerical values. For the tumor classification, we essentially assign weights to each tumor size level and use those weights to perform our analysis. We used elbow method and found two numbers of clusters in our dataset. Thereafter, we implemented the K-means clustering. The features of each cluster of patients is described below.
For the features of the clusters, the most distinguishing factor was tumor classification. Only patients with tumor classification of IVA, IVB, and IV were found in cluster_0. While, only patients with tumor classification of IIIA, IIIB, IIIC, IA2, and IA3 were found in cluster_1. In addition, patients with WHO_score of 4 were distinctively found in cluster_0. A summarized explanation for this is that patients in cluster_0 had less tumor size and less dependency in activities of daily life as compared to patients in cluster_1.
After identifying the clusters using K-means we then integrated the clusters into the dataset. We dropped the columns containing the raw score of questions and used only their composite score. The composite score could give us more explanation of the importance of the questions.
Computing the importance of questions
Based on the exploratory data analysis we understood that the researchers have computed a composite “total_score” for each question by taking the average of the scores of the sub-domain of each question. For example, total_score for the question about nausea would be the average of the scores of its frequency and severity. As we can see below from the results, the magnitude of the symptoms across different body parts is more in cluster_0 than in cluster_1. However, the importance of each question is calculated by taking the frequency of non-zero scores in each total_score.
Computing the “uncertainty” of questions was done in two forms. First, the standard deviation of answers by patients within the same cluster. Secondly, the standard deviation of answers by the same patient in different weeks.
Personalized recommendations for each patient
The individualized scores of questions for each patient was computed using the formula illustrated in the method section earlier in this report. However, the weight assigned to the importance and uncertainty values were used as hyperparameters. By assigning more weight to importance than uncertainty and vice versa we obtained different results. For deployment in weekly assessment during key periods in a clinical trial, these hyperparameters could be tuned depending on the phase of the treatment. If the treatment is in the first two cycles in an early phase trial, then more weight should be given to uncertainty of questions than importance of questions because for each therapy, drug manufacturers inform users beforehand of the anticipated toxicity associated with the use of their product. Hence, we already know the adverse effect of the drug that matter most but what we do not know is the uncertain toxicity. That is why we are interested in asking questions that we know least about. On the other hand, if the treatment is at other crucial assessment timeframes based upon of the anticipated toxicity profile of the drug, then more weight should be given to the importance of questions than the uncertainty of questions. This is because here we would be interested in asking what matters most for this specific patient.
Thank you for reading!
-Usman Dankoly, EHB Brussel