Papers and Publications

Here are the latest manuscripts/publications by Hadi Jahanshahi. The source code and the draft of most of them are publicly available. Please contact me if you have any questions.

Google Scholar
A deep reinforcement learning approach for the meal delivery problem

A deep reinforcement learning approach for the meal delivery problem

Jahanshahi, H., Bozanta, A., Cevik, M., Kavuk, E. M., Tosun, A., Sonuc, S. B., ... & Başar, A. (2022). A deep reinforcement learning approach for the meal delivery problem.Knowledge-Based Systems, 243, 108489.

In our study, we developed a Markov decision process (MDP) model for a meal delivery service, focusing on optimal courier assignments and delivery strategies. Using deep reinforcement learning, we evaluated the model on both synthetic and real-world data, comparing it with baseline approaches. Our model uniquely allows for intelligent order rejection and strategic re-positioning of couriers, improving delivery efficiency. We also explored the ideal number of couriers needed per hour, balancing customer demand with cost efficiency, optimizing courier routes, and considering factors like restaurant locations, customer destinations, and depot proximity. Our findings, derived from extensive numerical experiments with various deep Q-Networks algorithms, offer significant insights and practical applications for the meal delivery industry. However, due to confidentiality agreements, the source code remains proprietary.

ADPTriage: Approximate Dynamic Programming for Bug Triage

ADPTriage: Approximate Dynamic Programming for Bug Triage

Jahanshahi, H., Cevik, M., Mousavi, K., & Başar, A. (2023). ADPTriage: Approximate Dynamic Programming for Bug Triage. IEEE Transactions on Software Engineering, 4594-4609, v. 49.

Bug triaging is a critical task in any software development project. It entails triagers going over a list of open bugs, deciding whether each should be addressed, and, if so, which developer should fix it. In this study, we develop a Markov decision process (MDP) model for an online bug triage task. In addition to an optimization-based myopic technique, we provide an ADP-based bug triage solution, ADPTriage, which models the downstream uncertainty in the bug arrivals and developers' timetables. Specifically, without imposing any limits on the underlying stochastic process, this technique enables real-time decision-making on bug assignments while taking into consideration developers' expertise, bug type, and bug fixing time. We leverage Approximate Dynamic Programming, Simulation, and Optimization techniques to address this problem.


nTreeClus: A tree-based sequence encoder for clustering categorical series

nTreeClus: A tree-based sequence encoder for clustering categorical series

Jahanshahi, H., & Baydogan, M. G. (2022). nTreeClus: A tree-based sequence encoder for clustering categorical series. Neurocomputing, 494, 224-241.

In this work, we introduce a novel sequence clustering algorithm, called nTreeClus. It leverages tree-based algorithms and autoregression to find clusters within any categorical sequence datasets. We apply it to different sequence datasets, including DNA, Protein sequences, Coronavirus genomes, Wage levels, and travel behaviour datasets. The codes together with a step-by-step implementation are provided on my GitHub page.

Wayback Machine: A tool to capture the evolutionary behavior of the bug reports and their triage process in open-source software systems☆

Wayback Machine: A tool to capture the evolutionary behavior of the bug reports and their triage process in open-source software systems☆

Jahanshahi, H., Cevik, M., Navas-Sú, J., Başar, A., & González-Torres, A. (2022). Wayback Machine: A tool to capture the evolutionary behavior of the bug reports and their triage process in open-source software systems. Journal of Systems and Software, 189, 111308.

In this work, we design a tool to explore and regenerate all the bug triage decisions in open-source software issue tracking systems. In particular, we create this tool for Bugzilla, and the tool can be utilized by other researchers to compare their novel bug triage methods with the literature and actual decisions. We leverage Simulation techniques, NLP, and Object-Oriented programming in Python to build this tool.


Can transit investments in low-income neighbourhoods increase transit use? Exploring the nexus of income, car-ownership, and transit accessibility in Toronto

Can transit investments in low-income neighbourhoods increase transit use? Exploring the nexus of income, car-ownership, and transit accessibility in Toronto

Barri, E. Y., Farber, S., Kramer, A., Jahanshahi, H., Allen, J., & Beyazit, E. (2021). Can transit investments in low-income neighbourhoods increase transit use? Exploring the nexus of income, car-ownership, and transit accessibility in Toronto. Transportation Research Part D: Transport and Environment, 95, 102849.

We explore transit investment opportunities in Greater Toronto and Hamilton Area, Ontario, Canada. This study is based on data from the Transportation Tomorrow Survey (TTS), the largest travel survey in the world. Our main focus is on vulnerable groups, i.e., low-income carless people. We leverage the zero-inflated negative binomial (ZINB) model and GIS tools to address the problem. The findings indicate that improving transit in low-income inner suburbs, where most low-income car-owning households are living, would align social with environmental planning goals.

S-DABT: Schedule and Dependency-Aware Bug Triage in Open-Source Bug Tracking Systems

S-DABT: Schedule and Dependency-Aware Bug Triage in Open-Source Bug Tracking Systems

Jahanshahi, H., & Cevik, M. (2022). S-DABT: Schedule and Dependency-Aware Bug Triage in Open-Source Bug Tracking Systems. Information and Software Technology.

Fixing bugs in a timely manner lowers various potential costs in software maintenance. We propose the Schedule and Dependency-aware Bug Triage (S-DABT), a bug triaging method that utilizes integer programming and machine learning techniques to assign bugs to suitable developers. Our approach takes into account the textual data (NLP), bug fixing costs (Collaborative Filtering), and bug dependencies (Graph Analysis). We further incorporate developers' schedules in our formulation to have a more comprehensive model for this multifaceted problem (Gurobi in Python). Via the simulation of the issue tracking system, we also show how incorporating the schedule in the model formulation reduces the bug fixing time, improves the assignment accuracy, and utilizes the capability of each developer


Understanding transit ridership in an equity context through a comparison of statistical and machine learning algorithms

Understanding transit ridership in an equity context through a comparison of statistical and machine learning algorithms

Barri, E. Y., Farber, S., Jahanshahi, H., & Beyazit, E. (2022). Understanding transit ridership in an equity context through a comparison of statistical and machine learning algorithms. Journal of Transport Geography, 105, 103482.

Building an accurate model of travel behaviour based on individuals' characteristics and built environment attributes is of importance for policy-making and transportation planning. We explore the opportunities of leveraging ML models compared to the traditional approaches by investigating their performance, interpretability, and practical policy implication. Our findings reveal the great potential of ML algorithms for enhanced travel behaviour predictions for low-income strata without considerably sacrificing interpretability.

Auto response generation in online medical chat services

Auto response generation in online medical chat services

Jahanshahi, H., Kazmi, S., & Cevik, M. (2021). Auto response generation in online medical chat services. Journal of Healthcare Informatics Research. 118.

Auto reply suggestion helps physicians respond faster to patients in online medical chat services. Particularly, this work intends to assist Your Doctors Online company in designing a more convenient tool for their physicians consulting patients with their health-related queries. We adopt Bidirectional Encoder Representations from Transformers (BERT), Seq2seq, and LSTM to address the auto-response suggestion problem. We achieve a precision of up to 85% for the responses suggested by our machine learning model. This work is supported by Mitacs through the Mitacs Accelerate Program (IT18383). The source code cannot be shared due to the agreement with the Your Doctors Online company.


How income and car ownership shape travel behaviour: Exploring daily activity patterns through clustering trip chain sequences

How income and car ownership shape travel behaviour: Exploring daily activity patterns through clustering trip chain sequences

Barri, E. Y., Farber, S., Jahanshahi, H., Tiznado-Aitken, I., & Beyazit, E. (2023). How income and car ownership shape travel behaviour: Exploring daily activity patterns through clustering trip chain sequences, Transportmetrica A: Transport Science

This study examines the relationship between income, car ownership, and travel behaviour to address transport equity and manage travel demand. Unlike previous studies, the analysis considers the trip purpose and mode of transportation together, using a clustering framework to identify similar activity patterns. The findings show that income and car ownership influence travel decisions and patterns. Low-income carless households often rely on public transit or carpooling for daily trips, with women typically responsible for caregiving activities. Low-income car owners tend to use their cars for commuting, while males from wealthy carless households use a combination of public transit and active transportation. Achieving transport equity requires tailored policies that address the specific needs and barriers different household types face. The novelty of the study is its methodology, which integrates trip purpose and mode use as a unit of analysis and examines non-work activities separately, contributing to a holistic approach to understanding travel behaviour.

Does chronology matter in JIT defect prediction?: A Partial Replication Study

Does chronology matter in JIT defect prediction?: A Partial Replication Study

Jahanshahi, H., Jothimani, D., Başar, A., & Cevik, M. (2019, September). Does chronology matter in jit defect prediction? a partial replication study. In Proceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering (pp. 90-99).

Just-In-Time (JIT) models, unlike the traditional defect prediction models, detect the fix-inducing changes (or defect-inducing changes). In this work, we aim to investigate the effect of code change properties on JIT models over time. We used Random Forest to train and test the JIT model and Brier Score (BS) and Area Under Curve (AUC) for performance measurement.


DABT: A Dependency-aware Bug Triaging Method

DABT: A Dependency-aware Bug Triaging Method

Jahanshahi, H., Chhabra, K., Cevik, M., & Baþar, A. (2021). DABT: A dependency-aware bug triaging method. In Evaluation and Assessment in Software Engineering (pp. 221-230).

In software engineering practice, fixing a bug promptly reduces the associated costs. On the other hand, the manual bug fixing process can be time-consuming, cumbersome, and error-prone. In this work, we introduce a bug triaging method, called Dependency-aware Bug Triaging (DABT), which leverages natural language processing and integer programming to assign bugs to appropriate developers.

Classifying multi-level product categories using dynamic masking and transformer models

Classifying multi-level product categories using dynamic masking and transformer models

Ozyegen, O., Jahanshahi, H., Cevik, M., Bulut, B., Yigit, D., Gonen, F. F., & Başar, A. (2022). Classifying multi-level product categories using dynamic masking and transformer models. Journal of Data, Information and Management, 4(1), 71-85.

In an online shopping platform, a detailed categorization of the products greatly enhances user navigation. Online retailers also benefit from well-defined product categories as various sales and marketing operations such as special discounts and promotions can be easily done over a set of product categories. In this work, we use BERT, XLM, XLM-RoBERTa, and LSTM to automate product categorization for the Getir company. The source code cannot be shared due to the agreement with the Getir company. We achieve excellent prediction performance exceeding 90% accuracy and F1-score values.

Moving from cross-project defect prediction to heterogeneous defect prediction: a partial replication study

Moving from cross-project defect prediction to heterogeneous defect prediction: a partial replication study

Jahanshahi, H., Cevik, M., & Başar, A. (2021). Moving from cross-project defect prediction to heterogeneous defect prediction: a partial replication study. In Proceedings of the 30th Annual International Conference on Computer Science and Software Engineering (CASCON '20). IBM Corp., USA, (pp 133–142).

Software defect prediction heavily relies on the metrics collected from software projects. Earlier studies often used machine learning techniques to build, validate, and improve bug prediction models using either a set of metrics collected within a project or across different projects. We aim to explore the feasibility of Heterogeneous Defect Prediction and finally compare its performance with that of its predecessor, Cross-Project Defect Prediction.

Text classification for predicting multi-level product categories

Text classification for predicting multi-level product categories

Jahanshahi, H., Ozyegen, O., Cevik, M., Bulut, B., Yigit, D., Gonen, F. F., & Başar, A. (2021). Text classification for predicting multi-level product categories. Proceedings of the 31st Annual International Conference on Computer Science and Software Engineering. IBM Corp., USA, (pp 33–42).

In an online shopping platform, a detailed classification of the products facilitates user navigation. It also helps online retailers keep track of the price fluctuations in a certain industry or special discounts on a specific product category. Our numerical results indicate that when we have a multi-level classification task, dynamic masking of subcategories is effective in improving prediction accuracy. In addition, we observe that using bilingual product titles is generally beneficial, and neural network-based models perform significantly better than SVM and XGBoost models.

Predicting the Number of Reported Bugs in a Software Repository

Predicting the Number of Reported Bugs in a Software Repository

Jahanshahi, H., Cevik, M., & Başar, A. (2020, May). Predicting the number of reported bugs in a software repository. In Canadian Conference on Artificial Intelligence (pp. 309-320). Springer, Cham.

The bug growth pattern prediction is a complicated, unrelieved task, which needs considerable attention. Advance knowledge of the likely number of bugs discovered in the software system helps software developers designate sufficient resources at a convenient time. we examine eight different time series forecasting models, including Long Short Term Memory Neural Networks (LSTM), auto-regressive integrated moving average (ARIMA), and Random Forest Regressor in this study.

Active Learning for Multi-way Sensitivity Analysis with Application to Disease Screening Modeling

Active Learning for Multi-way Sensitivity Analysis with Application to Disease Screening Modeling

Cevik M., Angco S., Heydarigharaei E., Jahanshahi H., & Prayogo N. (2022); “Active Learning for Multi-way Sensitivity Analysis with Application to Disease Screening Modeling”; Journal of Healthcare Informatics Research.

Sensitivity analysis is an important aspect of model development as it can be used to assess the level of confidence that is associated with the outcomes of a study. In this study, we investigate machine learning-based approaches for speeding up the sensitivity analysis. Furthermore, we apply feature selection methods to identify the relative importance of quantitative model parameters in terms of their predictive ability on the outcomes. Our experiments on ultrasound and Magnetic Resonance Imaging (MRI) for timely detection of breast cancer indicate that ensemble methods such XGBoost consistently outperform other ML algorithms in the prediction task of the associated sensitivity analysis.

en_CAEnglish