Papers and Publications
Here are the latest manuscripts/publications by Hadi Jahanshahi. The source code and the draft of most of them are publicly available. Please contact me if you have any questions.
Google ScholarIn our study, we developed a Markov decision process (MDP) model for a meal delivery service, focusing on optimal courier assignments and delivery strategies. Using deep reinforcement learning, we evaluated the model on both synthetic and real-world data, comparing it with baseline approaches. Our model uniquely allows for intelligent order rejection and strategic re-positioning of couriers, improving delivery efficiency. We also explored the ideal number of couriers needed per hour, balancing customer demand with cost efficiency, optimizing courier routes, and considering factors like restaurant locations, customer destinations, and depot proximity. Our findings, derived from extensive numerical experiments with various deep Q-Networks algorithms, offer significant insights and practical applications for the meal delivery industry. However, due to confidentiality agreements, the source code remains proprietary.
Bug triaging is a critical task in any software development project. It entails triagers going over a list of open bugs, deciding whether each should be addressed, and, if so, which developer should fix it. In this study, we develop a Markov decision process (MDP) model for an online bug triage task. In addition to an optimization-based myopic technique, we provide an ADP-based bug triage solution, ADPTriage, which models the downstream uncertainty in the bug arrivals and developers' timetables. Specifically, without imposing any limits on the underlying stochastic process, this technique enables real-time decision-making on bug assignments while taking into consideration developers' expertise, bug type, and bug fixing time. We leverage Approximate Dynamic Programming, Simulation, and Optimization techniques to address this problem.
In this work, we introduce a novel sequence clustering algorithm, called nTreeClus. It leverages tree-based algorithms and autoregression to find clusters within any categorical sequence datasets. We apply it to different sequence datasets, including DNA, Protein sequences, Coronavirus genomes, Wage levels, and travel behaviour datasets. The codes together with a step-by-step implementation are provided on my GitHub page.
In this work, we design a tool to explore and regenerate all the bug triage decisions in open-source software issue tracking systems. In particular, we create this tool for Bugzilla, and the tool can be utilized by other researchers to compare their novel bug triage methods with the literature and actual decisions. We leverage Simulation techniques, NLP, and Object-Oriented programming in Python to build this tool.
We explore transit investment opportunities in Greater Toronto and Hamilton Area, Ontario, Canada. This study is based on data from the Transportation Tomorrow Survey (TTS), the largest travel survey in the world. Our main focus is on vulnerable groups, i.e., low-income carless people. We leverage the zero-inflated negative binomial (ZINB) model and GIS tools to address the problem. The findings indicate that improving transit in low-income inner suburbs, where most low-income car-owning households are living, would align social with environmental planning goals.
Fixing bugs in a timely manner lowers various potential costs in software maintenance. We propose the Schedule and Dependency-aware Bug Triage (S-DABT), a bug triaging method that utilizes integer programming and machine learning techniques to assign bugs to suitable developers. Our approach takes into account the textual data (NLP), bug fixing costs (Collaborative Filtering), and bug dependencies (Graph Analysis). We further incorporate developers' schedules in our formulation to have a more comprehensive model for this multifaceted problem (Gurobi in Python). Via the simulation of the issue tracking system, we also show how incorporating the schedule in the model formulation reduces the bug fixing time, improves the assignment accuracy, and utilizes the capability of each developer
Building an accurate model of travel behaviour based on individuals' characteristics and built environment attributes is of importance for policy-making and transportation planning. We explore the opportunities of leveraging ML models compared to the traditional approaches by investigating their performance, interpretability, and practical policy implication. Our findings reveal the great potential of ML algorithms for enhanced travel behaviour predictions for low-income strata without considerably sacrificing interpretability.
Auto reply suggestion helps physicians respond faster to patients in online medical chat services. Particularly, this work intends to assist Your Doctors Online company in designing a more convenient tool for their physicians consulting patients with their health-related queries. We adopt Bidirectional Encoder Representations from Transformers (BERT), Seq2seq, and LSTM to address the auto-response suggestion problem. We achieve a precision of up to 85% for the responses suggested by our machine learning model. This work is supported by Mitacs through the Mitacs Accelerate Program (IT18383). The source code cannot be shared due to the agreement with the Your Doctors Online company.
This study examines the relationship between income, car ownership, and travel behaviour to address transport equity and manage travel demand. Unlike previous studies, the analysis considers the trip purpose and mode of transportation together, using a clustering framework to identify similar activity patterns. The findings show that income and car ownership influence travel decisions and patterns. Low-income carless households often rely on public transit or carpooling for daily trips, with women typically responsible for caregiving activities. Low-income car owners tend to use their cars for commuting, while males from wealthy carless households use a combination of public transit and active transportation. Achieving transport equity requires tailored policies that address the specific needs and barriers different household types face. The novelty of the study is its methodology, which integrates trip purpose and mode use as a unit of analysis and examines non-work activities separately, contributing to a holistic approach to understanding travel behaviour.
Just-In-Time (JIT) models, unlike the traditional defect prediction models, detect the fix-inducing changes (or defect-inducing changes). In this work, we aim to investigate the effect of code change properties on JIT models over time. We used Random Forest to train and test the JIT model and Brier Score (BS) and Area Under Curve (AUC) for performance measurement.
In software engineering practice, fixing a bug promptly reduces the associated costs. On the other hand, the manual bug fixing process can be time-consuming, cumbersome, and error-prone. In this work, we introduce a bug triaging method, called Dependency-aware Bug Triaging (DABT), which leverages natural language processing and integer programming to assign bugs to appropriate developers.
In an online shopping platform, a detailed categorization of the products greatly enhances user navigation. Online retailers also benefit from well-defined product categories as various sales and marketing operations such as special discounts and promotions can be easily done over a set of product categories. In this work, we use BERT, XLM, XLM-RoBERTa, and LSTM to automate product categorization for the Getir company. The source code cannot be shared due to the agreement with the Getir company. We achieve excellent prediction performance exceeding 90% accuracy and F1-score values.
Software defect prediction heavily relies on the metrics collected from software projects. Earlier studies often used machine learning techniques to build, validate, and improve bug prediction models using either a set of metrics collected within a project or across different projects. We aim to explore the feasibility of Heterogeneous Defect Prediction and finally compare its performance with that of its predecessor, Cross-Project Defect Prediction.
In an online shopping platform, a detailed classification of the products facilitates user navigation. It also helps online retailers keep track of the price fluctuations in a certain industry or special discounts on a specific product category. Our numerical results indicate that when we have a multi-level classification task, dynamic masking of subcategories is effective in improving prediction accuracy. In addition, we observe that using bilingual product titles is generally beneficial, and neural network-based models perform significantly better than SVM and XGBoost models.
The bug growth pattern prediction is a complicated, unrelieved task, which needs considerable attention. Advance knowledge of the likely number of bugs discovered in the software system helps software developers designate sufficient resources at a convenient time. we examine eight different time series forecasting models, including Long Short Term Memory Neural Networks (LSTM), auto-regressive integrated moving average (ARIMA), and Random Forest Regressor in this study.
Sensitivity analysis is an important aspect of model development as it can be used to assess the level of confidence that is associated with the outcomes of a study. In this study, we investigate machine learning-based approaches for speeding up the sensitivity analysis. Furthermore, we apply feature selection methods to identify the relative importance of quantitative model parameters in terms of their predictive ability on the outcomes. Our experiments on ultrasound and Magnetic Resonance Imaging (MRI) for timely detection of breast cancer indicate that ensemble methods such XGBoost consistently outperform other ML algorithms in the prediction task of the associated sensitivity analysis.