Faculty Dr Saleti Sumalatha

Dr Saleti Sumalatha

Assistant Professor

Department of Computer Science and Engineering

Contact Details

sumalatha.s@srmap.edu.in

Office Location

SR Block, Level 5, Cabin No: 20

Education

2020
Ph.D.
National institute of Technology, Warangal
India
2010
M.Tech
Annamacharya Institute of science and technology, JNTU Anantapur
India
2004
B.Tech
Narayana Engineering College, JNTU Hyderabad
India

Personal Website

Experience

  • January 2019 to March 2020 - Associate Professor – Geethanjali Institute of Science and Technology, Nellore, Andhra Pradesh.
  • July 2015 to December 2018 – Research Scholar – National Institute of Technology, Warangal, Telangana.
  • May 2012 to May 2015 – Associate Professor – Geethanjali Institute of Science and Technology, Nellore, Andhra Pradesh.
  • October 2010 to April 2012 – Assistant Professor - Narayana Engineering College, Nellore, Andhra Pradesh.
  • July 2004 to August 2008 – Assistant Professor – Narayana Engineering College, Nellore, Andhra Pradesh.

Research Interest

  • To implement a learning management system and study the navigational patterns to enhance students learning.
  • To develop incremental mining algorithms.

Awards

  • 2022 - Undergraduate students under my supervision received one gold medal and one silver medal (September 2022) for paper presentation in Research Day organized by SRM University, AP, India.
  • 2022 - Undergraduate students under my supervision received a gold medal (January 2022) for paper presentation in Research Day organized by SRM University, AP, India.
  • 2015 to 2018 - Ph.D. Fellowship - Ministry of Human Resource Development
  • 2010 – Secured first rank in Master’s Degree.

Memberships

  • Life Member of ISTE

Publications

  • A fully decentralized federated adversarial vision transformer with blockchain and secure aggregation for visual-based intrusion and malware forensics

    Shiva, Fazad

    Journal, International Journal of Data science and analytics, 2026, Quartile: Q2DOI Link

    View abstract ⏷

    This paper presents a fully decentralized federated adversarial vision transformer (ViT) framework for secure, privacy-preserving, and robust image-based malware classification. Unlike conventional federated learning that relies on centralized aggregation and remains vulnerable to privacy breaches and adversarial attacks, the proposed system employs blockchain- based decentralized aggregation integrated with secure multi-party computation. Encrypted local model updates are securely aggregated without a central server, while the blockchain ledger ensures transparency, tamper resistance, and trust. To further enhance security, a zero-knowledge proof-based mechanism validates masked model updates, enabling verifiable aggregation without exposing raw parameters. Clients reconstruct the global model through decentralized consensus, preventing direct access to others’ updates. Adversarial robustness is improved via client-side adversarial ViT training, incorporating projected gradient descent-generated malware images with clean samples, thereby reducing false classifications. Computational efficiency is achieved by leveraging pre-trained ViT variants for resource-constrained environments. Extensive experiments on Malimg, Microsoft BIG 2015, and Malevis datasets demonstrate superior performance, achieving accuracies of 98.30%, 98.93%, and 95.72%, respectively. Compared to centralized and federated adversarial ViTs, as well as state-of-the-art methods (FASe-ViT, FASNet, DM-Mal, Fed-Mal), the proposed framework consistently achieves higher accuracy, precision, recall, and F1-scores, while ensuring privacy, resilience, and decentralized trust.
  • Dynamic RBFN with vector attention-guided feature selection for spam detection in social media

    Elakkiya E., Saleti S., Balakrishnan A.

    Article, Complex and Intelligent Systems, 2026, DOI Link

    View abstract ⏷

    Online social media platforms have emerged as primary engagement channels for internet users, leading to increased dependency on social network information. This growing reliance has attracted cybercriminals, resulting in a surge of malicious activities such as spam. Consequently, there is a pressing need for efficient spam detection mechanisms. Although several techniques have been proposed for social network spam detection, spammers continually evolve their strategies to bypass these systems. In response, researchers have focused on extracting additional features to better identify spammer patterns. However, this often introduces feature redundancy and complexity, which traditional machine learning-based feature selection methods struggle to manage in highly complex datasets. To address this, we propose a novel attention network-based feature selection method that assigns weights to features based on their importance, reducing redundancy while retaining relevant information. Additionally, an adaptive Radial Basis Function Neural Network (RBFN) is employed for spam classification, enabling dynamic weight updates to reflect evolving spam behaviors. The proposed method is evaluated against state-of-the-art feature selection, deep learning models, and existing spam detection techniques using accuracy, F-measure, and false-positive rate. Experimental results demonstrate that our approach outperforms existing methods, offering superior performance in detecting spam on social networks.
  • Federated learning-based disease prediction: A fusion approach with feature selection and extraction

    Kapila R., Saleti S.

    Article, Biomedical Signal Processing and Control, 2025, DOI Link

    View abstract ⏷

    The ability to predict diseases is critical in healthcare for early intervention and better patient outcomes. Data security and privacy significantly classified medical data from several institutions is analyzed. Cooperative model training provided by Federated Learning (FL), preserves data privacy. In this study, we offer a fusion strategy for illness prediction, combining FL with Anova and Chi-Square Feature Selection (FS) and Linear Discriminate Analysis (LDA) Feature Extraction (FE) techniques. This research aims to use FS and FE techniques to improve prediction performance while using the beneficial aspects of FL. A comprehensive analysis of the distributed data is ensured by updating aggregate models with information from all participating institutions. Through collaboration, a robust disease prediction system excels in the limited possibilities of individual datasets. We assessed the fusion strategy on the Cleveland heart disease and diabetes datasets from the UCI repository. Comparing the fusion strategy to solo FL or conventional ML techniques, the prediction performance a unique fusion methodology for disease prediction. Our proposed models, Chi-Square with LDA and Anova with LDA leveraging FL, exhibited exceptional performance on the diabetes dataset, achieving identical accuracy, precision, recall, and f1-score of 92.3%, 94.36%, 94.36, and 94.36%, respectively. Similarly, on the Cleveland heart disease dataset, these models demonstrated significant performance, achieving accuracy, precision, recall, and f1-score of 88.52%, 87.87%, 90.62, and 89.23%, respectively. The results have the potential to revolutionize disease prediction, maintain privacy, advance healthcare, and outperform state-of-the-art models.
  • Optimizing Deep Learning for Pneumonia Diagnosis Using Chest X-Ray Data

    Kapila R., Sunanda A.S., Saleti S., Elakkiya E.

    Book chapter, Sensor Data Analytics for Intelligent Healthcare Delivery, 2025, DOI Link

  • Enhancing Disease Prediction with Correctness-Driven Ensemble Models

    Kapila R., Saleti S.

    Article, SN Computer Science, 2025, DOI Link

    View abstract ⏷

    Heart disease is the most dangerous and hazardous one. Human lives can be spared if the disease is diagnosed early enough and treated properly. We propose an efficient ensemble model which classifies all records correctly on the benchmark datasets. The correctness is accomplished by using Anova-Principal Component Analysis (Anv-PCA) techniques with a Stacking Classifier (SC) to select and extract the best features. The most significant component to evaluate in the medical area is recall. The findings show that the proposed Anv-PCA with SC meets all of the correctness requirements in concepts of accuracy, precision, recall, and f1-score with the highest results compared with the existing approaches. Anv-PCA, a method for selecting and extracting features, is paired with an ensemble classification algorithm in the approach we propose, which makes use of the Cleveland heart disease UCI dataset. All patient records are correctly categorized using this method, fulfilling the required criteria for correctness. The proposed model is also validated on other six publicly available benchmark datasets for diabetes, cardiovascular, Framingham, CBC, COVID-specific and HD (comprehensive) datasets available in the UCI repository, which presently meets the correctness requirements. The proposed approach exceeds all cutting-edge models.
  • Insights into Gun-Related Deaths: A Comprehensive Machine Learning Analysis

    Panchumarthi L.Y., Parchuri L., Saleti S.

    Conference paper, 2024 15th International Conference on Computing Communication and Networking Technologies, ICCCNT 2024, 2024, DOI Link

    View abstract ⏷

    This work employs both supervised and unsupervised machine learning techniques to examine firearm-related fatalities in the US and identify trends, patterns, and risk factors within the data. During the supervised learning phase, techniques such as logistic regression, decision trees, random forests, and neural networks were used to predict the kind of death (suicide, homicide, accidental, or unknown) based on demographic data like sex, age, race, place, and education. Findings show that the neural network and random forest models exhibit promising precision and recall values across several classes, and that they obtained the highest accuracy, reaching 79.88% and 83.59%, respectively. Using clustering techniques including Agglomerative clustering, K-means, and Gaussian mixture models, gun-related fatalities were categorized based on demographic and temporal data during the unsupervised learning stage. The analysis revealed distinct clusters of deaths, providing insights into the varying patterns and trends over time and across demographic groups. The K-means algorithm, with a silhouette score of 0.42, demonstrated meaningful separation among clusters. The research contributes to understanding the complex dynamics of gun-related deaths, shedding light on both individual risk factors and broader trends. However, further analysis could explore additional dimensions of the dataset or delve deeper into the interpretation of clustering results. The study also highlights how crucial it is to take into consideration the moral consequences and constraints of machine learning applications in complex fields like public health.
  • Enhancing Forecasting Accuracy with a Moving Average-Integrated Hybrid ARIMA-LSTM Model

    Saleti S., Panchumarthi L.Y., Kallam Y.R., Parchuri L., Jitte S.

    Article, SN Computer Science, 2024, DOI Link

    View abstract ⏷

    This research provides a time series forecasting model that is hybrid which combines Long Short-Term Memory (LSTM) and Autoregressive Integrated Moving Average (ARIMA) models with moving averages. For modelling stationary time series, LSTM models are utilized, while modelling non-stationary time series is done using ARIMA models. While LSTM models are more suited for capturing long-term dependencies, ARIMA models are superior in catching short-term relationships in time series data. The hybrid model combines the short-term dependency modeling of ARIMA utilising LSTM’s long-term dependency modelling. This combination leads to greater accuracy predictions for time series data that are both stationary and non-stationary. Also, Triple Exponential Moving Average (TEMA), Weighted Moving Average (WMA), Simple Moving Average (SMA), and six other moving averages were examined to determine how well the hybrid model performed. Kaufman Adaptive Moving Average (KAMA), MIDPOINT, MIDPRICE individually helps to know which methods give much precision. The study compares the hybrid model’s predicting performance to that of standalone ARIMA and LSTM models, in addition to other prominent forecasting approaches like linear regression and random forest. The findings indicate that the hybrid model surpasses the individual models and other benchmark methods, achieving increased precision in terms of Mean absolute percentage error (MAPE) and Root mean squared error (RMSE). The research also investigates the impact of different hyperparameters and model configurations on performance forecasts, giving information about the ideal settings for the hybrid model. Overall, the proposed ARIMA-LSTM hybrid model with moving averages is a promising approach for accurate and reliable stock price forecasting, which has practical implications for financial decision-making and risk management.
  • Comparative Analysis of Optimization Algorithms for Feature Selection in Heart Disease Classification

    Kapila R., Saleti S.

    Conference paper, Lecture Notes in Networks and Systems, 2024, DOI Link

    View abstract ⏷

    In order to classify the Cleveland Heart Disease dataset, this study evaluates the performance of three optimization methods, namely Fruit Fly Optimization (FFO), Particle Swarm Optimization (PSO), and Grey Wolf Optimizer (GWO). The top 10 features are identified using FFO, with remarkable results for accuracy (88.52%), precision (87.88%), recall (90.63%), and f1-score (89.23%). After using PSO, the accuracy, precision, recall, specificity, and f1-score are all 85.25%, 87.10%, 84.38%, and 85.71% respectively. Finally, GWO is used, which results in precision, accuracy, recall, and f1-score values of 93.33%, 90.16%, 90.11%, 87.5%, and 90.32% respectively, highlighting its consistent superior performance. FFO shows competitive outcomes with notable accuracy and recall through a comparative examination. PSO displays comparable precision and recall while displaying a somewhat poorer accuracy. In contrast, GWO performs better than both FFO and PSO, displaying great accuracy and precision along with remarkable recall and specificity. These results provide important information on the effectiveness of feature selection methods utilized in optimization algorithms for heart disease classification. The study also highlights the need for more investigation into the potential of these optimization algorithms in many fields, broadening their use beyond the classification of disease. This kind of work might improve the progress of the field of feature selection and aid in the creation of better classification models.
  • Addressing Nutrition Gaps: An In-depth Study of Child Undernutrition

    Panchumarthi L.Y., Manike M., Saleti S., Singh P., Madala P.S., Varghese S.A.

    Conference paper, Proceedings - 2024 OITS International Conference on Information Technology, OCIT 2024, 2024, DOI Link

    View abstract ⏷

    Malnutrition among children poses a significant health issue, leading to both immediate and long-term adverse health consequences. Shockingly, around two out of every three children in India, residing in both rural and urban areas, experience undernourishment. Moreover, the future behavioral and psychological well-being of children is profoundly affected by the consequences of insufficient nutrient intake. The key observations during a child’s growth, focusing on Wasting, Stunting, and Underweight, highlight the severity of the problem. This study utilizes data from the government initiatives E-Sadhana and Anganwadi. By employing data mining and econometric techniques, the paper aims to elucidate the causal factors and provide a visual representation of the disparities in under-nutrition across the state, considering economic and social factors.
  • Comparative Study of Melanoma Disease Classification using Deep Learning

    Panchumarthi L.Y., Gogineni A.K., Saleti S., Parchuri L., Kakarala H.

    Conference paper, Proceedings - 2024 OITS International Conference on Information Technology, OCIT 2024, 2024, DOI Link

    View abstract ⏷

    Melanoma is one of the deadliest and fastest-growing diseases on the globe, taking many lives every year. The early identification of melanoma through dermoscopy images can notably enhance the chances of survival. However, due to factors including the absence of contrast between the lesions and the skin and the visual similarity between melanoma and nonmalignant lesions, reliable melanoma differentiation is extremely difficult. Therefore, the accuracy and productivity of pathologists can be significantly increased by implementing a trustworthy automated method for the detection of skin tumours. This study introduces a method that employs deep learning models for cancer detection. Furthermore, we evaluate and analyze the following six deep learning approaches: Inception-ResNetV2, CNN, VGG16, EfficientNet, Densenet201, and MobileNetV2. Two different datasets, ISIC and MNIST, were used to evaluate the suggested deep learning frameworks. The experimental results demonstrate the promising accuracy of our frameworks. This survey highlights significant datasets, benchmark challenges, and evaluation metrics related to skin lesion analysis, offering a thorough overview of the field.
  • Enhancing the Accuracy of Manufacturing Process Error Detection Through SMOTE-Based Oversampling Using Machine Learning and Deep Learning Techniques

    Boyapati S.V., Rakshitha G.B.S., Reddy M.R., Saleti S.

    Conference paper, International Conference on Integrated Circuits, Communication, and Computing Systems, ICIC3S 2024 - Proceedings, 2024, DOI Link

    View abstract ⏷

    A production competency study leads to a rise in the manufacturing sectors' strategic emphasis. Developing semiconductor materials is a highly complex approach that necessitates numerous evaluations. It is impossible to emphasize the significance of the quality of the product. We put up a number of methods for automatically creating a prognostic model that is effective at identifying equipment flaws throughout the semiconductor materials' wafer fabrication process. The SECOM dataset is representative of semiconductor production procedures that go through numerous tests performed. There are imbalanced statistics in the dataset, so our proposed methodology incorporates SMOTE (Synthetic Minority Over-sampling Technique) functionality that is introduced to mitigate the imbalance of the training dataset by leveling off any unbalanced attributes. Detecting faults in the manufacturing process improves semiconductor quality and testing efficiency, and is used to validate both approaches to Machine Learning and Deep Learning algorithms. This is accomplished by collecting performance metrics during the development process. Another aspect of our effort to cut down on the training time for testing is highlighted in our research report.
  • A Survey on Occupancy-Based Pattern Mining

    Inaganti B., Saleti S.

    Conference paper, Lecture Notes in Networks and Systems, 2024, DOI Link

    View abstract ⏷

    Occupancy-based pattern mining has emerged as a significant research topic in recent times. This paper presents a comprehensive survey on occupancy, which serves as a measure to augment the significance of patterns. The survey covers various types of patterns, including frequent itemsets, high utility itemsets, frequent sequences, and high utility sequences, all in the context of occupancy. Additionally, the paper delves into techniques aimed at reducing the search space in the aforementioned pattern mining problems. These techniques are crucial for improving the efficiency and scalability of the mining process, especially when dealing with large-scale datasets. Furthermore, the paper discusses potential research extensions for occupancy-based pattern mining. These extensions could explore new applications, explore novel algorithms, or further enhance the effectiveness of occupancy as a measure for pattern evaluation. In general, this survey provides an important resource for researchers interested in understanding and advancing occupancy-based pattern mining techniques.
  • Leveraging ResNet for Efficient ECG Heartbeat Classification

    Panchumarthi L.Y., Padmanabhuni S., Saleti S.

    Conference paper, Proceedings - 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems: Harmonizing Signals, Data, and Energy: Bridging the Digital Future, SPICES 2024, 2024, DOI Link

    View abstract ⏷

    This paper provides a novel approach that uses a modified version of the ResNet architecture to classify heartbeats on an electrocardiogram (ECG). Padding, convolution, max pooling, convolutional blocks, average pooling, and fully linked layers are the six processes in the approach. The MIT-BIH Arrhythmia Database is used to test the approach on five different types of heartbeats: unclassifiable, supraventricular premature, premature ventricular contraction, fusion of ventricular and normal, and normal. The outcomes demonstrate that the suggested approach outperforms other current techniques like LSTM, CNN, and EfficientNet, achieving an accuracy of 98.6%. The performance, restrictions, and future directions of the model are also thoroughly examined in this work. The automated ECG heartbeat categorization using deep learning techniques is one way that the article advances the field of cardiac diagnosis.
  • Optimizing Recommendation Systems: Analyzing the Impact of Imputation Techniques on Individual and Group Recommendation Systems

    Bhushan Mada S.P., Tata R., Sree Reddy Thondapu S.T., Saleti S.

    Conference paper, Proceedings - 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems: Harmonizing Signals, Data, and Energy: Bridging the Digital Future, SPICES 2024, 2024, DOI Link

    View abstract ⏷

    In today's world, Recommendation Systems play a significant role in guiding and simplifying the decision-making process for individuals and groups. However, the presence of missing data in user-item interaction matrices poses a challenge to accurately identify user preferences and provide relevant suggestions. This is particularly true for group recommendation systems that cater to multiple users. To address this challenge, we have applied four imputation techniques to individual and group recommendation models, including User-based Collaborative filtering, Matrix factorization using Singular Value Decomposition, and deep learning-based models like Autoencoders. We evaluated the effectiveness of these techniques using root mean squared error and mean absolute error metrics and observed a significant impact on the quality of recommendations. Additionally, we implemented aggregation strategies like Borda count, Additive Utilitarian, Multiplicative Utilitarian, Least Misery, and Most Pleasure for Group Recommendations. We evaluated the performance of these strategies using satisfaction score and disagreement score. Overall, our findings suggest that imputation techniques can significantly improve the quality of recommendations in both individual and group recommendation systems.
  • Enhancing Customer Churn Prediction: Advanced Models and Resampling Techniques in Dynamic Business Environments

    Thotakura Y.C., Manikanta Yarramsetty D., Doppalapudi K.K., Shasank Alaparthi S., Saleti S.

    Conference paper, Intelligent Computing and Emerging Communication Technologies, ICEC 2024, 2024, DOI Link

    View abstract ⏷

    Customer churn analysis is critical for businesses looking to hold onto market share in today's dynamic business environment. The development of e-Finance presents additional difficulties for the traditional banking sector as the digital marketplace grows. Banks face several challenges, including fintech competition, dwindling client loyalty, and digital transformation. Bank managers can identify problems, identify potential churn customers early on, and develop effective retention strategies based on client traits and preferences by analyzing probable causes of bank customer turnover from multiple perspectives and building models for predicting churn. Not only banks but also large corporate sectors like telecommunication, and over-the-top (OTT) platforms do face customer churn. This study proposes the Random Leaf Model (RLM) and also explores the Logit Leaf Model (LLM), and Neural Network Ensemble Model, three sophisticated predictive modeling methodologies. Proactive strategies are necessary in today's marketplaces due to their competitive nature. The primary problem with current automatic churn prediction algorithms is the substantial gap between majority and minority class proportions in the datasets, which might lead to model bias in favor of the dominant class. The shortcomings of conventional churn analysis techniques underscore the necessity of implementing advanced cutting-edge algorithms to achieve precise forecasts.
  • A comparison of various machine learning algorithms and execution of flask deployment on essay grading

    Kotha U.M., Gaddam H., Siddenki D.R., Saleti S.

    Article, International Journal of Electrical and Computer Engineering, 2023, DOI Link

    View abstract ⏷

    Students’ performance can be assessed based on grading the answers written by the students during their examination. Currently, students are assessed manually by the teachers. This is a cumbersome task due to an increase in the student-teacher ratio. Moreover, due to coronavirus disease (COVID-19) pandemic, most of the educational institutions have adopted online teaching and assessment. To measure the learning ability of a student, we need to assess them. The current grading system works well for multiple choice questions, but there is no grading system for evaluating the essays. In this paper, we studied different machine learning and natural language processing techniques for automated essay scoring/grading (AES/G). Data imbalance is an issue which creates the problem in predicting the essay score due to uneven distribution of essay scores in the training data. We handled this issue using random over sampling technique which generates even distribution of essay scores. Also, we built a web application using flask and deployed the machine learning models. Subsequently, all the models have been evaluated using accuracy, precision, recall, and F1-score. It is found that random forest algorithm outperformed the other algorithms with an accuracy of 97.67%, precision of 97.62%, recall of 97.67%, and F1-score of 97.58%.
  • Heart Disease Prediction Using Novel Quine McCluskey Binary Classifier (QMBC)

    Kapila R., Ragunathan T., Saleti S., Lakshmi T.J., Ahmad M.W.

    Article, IEEE Access, 2023, DOI Link

    View abstract ⏷

    Cardiovascular disease is the primary reason for mortality worldwide, responsible for around a third of all deaths. To assist medical professionals in quickly identifying and diagnosing patients, numerous machine learning and data mining techniques are utilized to predict the disease. Many researchers have developed various models to boost the efficiency of these predictions. Feature selection and extraction techniques are utilized to remove unnecessary features from the dataset, thereby reducing computation time and increasing the efficiency of the models. In this study, we introduce a new ensemble Quine McCluskey Binary Classifier (QMBC) technique for identifying patients diagnosed with some form of heart disease and those who are not diagnosed. The QMBC model utilizes an ensemble of seven models, including logistic regression, decision tree, random forest, K-nearest neighbour, naive bayes, support vector machine, and multilayer perceptron, and performs exceptionally well on binary class datasets. We employ feature selection and feature extraction techniques to accelerate the prediction process. We utilize Chi-Square and ANOVA approaches to identify the top 10 features and create a subset of the dataset. We then apply Principal Component Analysis to the subset to identify 9 prime components. We utilize an ensemble of all seven models and the Quine McCluskey technique to obtain the Minimum Boolean expression for the target feature. The results of the seven models ( x_0, x_1, x_2,⋖, x_6 ) are considered independent features, while the target attribute is dependent. We combine the projected outcomes of the seven ML models and the target feature to form a foaming dataset. We apply the ensemble model to the dataset, utilizing the Quine McCluskey minimum Boolean equation built with an 80:20 train-to-test ratio. Our proposed QMBC model surpasses all current state-of-the-art models and previously suggested methods put forward by various researchers.
  • Optimizing fetal health prediction: Ensemble modeling with fusion of feature selection and extraction techniques for cardiotocography data

    Kapila R., Saleti S.

    Article, Computational Biology and Chemistry, 2023, DOI Link

    View abstract ⏷

    Cardiotocography (CTG) captured the fetal heart rate and the timing of uterine contractions. Throughout pregnancy, CTG intelligent categorization is crucial for monitoring fetal health and preserving proper fetal growth and development. Since CTG provides information on the fetal heartbeat and uterus contractions, which helps determine if the fetus is pathologic or not, obstetricians frequently use it to evaluate a child's physical health during pregnancy. In the past, obstetricians have artificially analyzed CTG data, which is time-consuming and inaccurate. So, developing a fetal health categorization model is crucial as it may help to speed up the diagnosis and treatment and conserve medical resources. The CTG dataset is used in this study. To diagnose the illness, 7 machine learning models are employed, as well as ensemble strategies including voting and stacking classifiers. In order to choose and extract the most significant and critical attributes from the dataset, Feature Selection (FS) techniques like ANOVA and Chi-square, as well as Feature Extraction (FE) strategies like Principal Component Analysis (PCA) and Independent Component Analysis (ICA), are being used. We used the Synthetic Minority Oversampling Technique (SMOTE) approach to balance the dataset because it is unbalanced. In order to forecast the illness, the top 5 models are selected, and these 5 models are used in ensemble methods such as voting and stacking classifiers. The utilization of Stacking Classifiers (SC), which involve Adaboost and Random Forest (RF) as meta-classifiers for disease detection. The performance of the proposed SC with meta-classifier as RF model, which incorporates Chi-square with PCA, outperformed all other state-of-the-art models, achieving scores of 98.79%,98.88%,98.69%,96.32%, and 98.77% for accuracy, precision, recall, specificity, and f1-score respectively.
  • An efficient ensemble-based Machine Learning for breast cancer detection

    Kapila R., Saleti S.

    Article, Biomedical Signal Processing and Control, 2023, DOI Link

    View abstract ⏷

    Breast cancer is a very severe type of cancer that often develops in breast cells. Attempting to develop an effective predictive model for breast cancer prognosis prediction is urgently needed despite substantial advancements in the management of symptomatic breast cancer over the past ten years. The precise prediction will offer numerous advantages, including the ability to diagnose cancer at an early stage and protect patients from needless medical care and related costs. In the medical field, recall is just as important as model accuracy. Even more crucially in the medical area, a model is not very good if its accuracy is high but its recall is low. To boost accuracy while still assigning equal weight to recall, we proposed a model that ensembles Feature Selection (FS), Feature Extraction (FE), and 5 Machine Learning (ML) models. There are three steps in our proposed model. The Correlation Coefficient (CC) and Anova (Anv) feature selection methodologies to choose the features in the first stage. Applying Uniform Manifold Approximation and Projection (UMAP), t-distributed Stochastic Neighbour Embedding (t-SNE), and Principal Component Analysis (PCA) to extract the features in the second stage without compromising the crucial information. With 5 ML models and ensemble models such as Voting Classifier (VC) and Stacking Classifier (SC) after selecting and extracting features from the dataset to predict the disease will be the last stage. The results show that the proposed model CC-Anv with PCA using a SC outperformed all the existing methodologies with 100% accuracy, precision, recall, and f1-score.
  • An enhancement in the efficiency of disease prediction using feature extraction and feature selection

    Kapila R., Saleti S.

    Book chapter, Contemporary Applications of Data Fusion for Advanced Healthcare Informatics, 2023, DOI Link

    View abstract ⏷

    Cardiovascular diseases constitute one of the most dangerous and fatal illnesses. According to statistics, in 2019,17.9 million deaths are reportedfrom cardiovascular diseases. As a result, it is essential to detect the sickness early on to minimize the death rate. To handle data efficiently and precisely forecast the symptoms of illness, data mining and machine learning approaches may be applied. This study intends to employ seven supervised machine learning (ML) techniques to anticipate heart disease. The adoption of ML algorithms is the study's main objective and to investigate how feature extraction (FE) and feature selection (FS) methods might increase the effectiveness of ML models. The experimental results indicate that models withfeature selection and extraction techniques outperformed the model with the entire features from the dataset. As a case study, the authors considered three additional datasets, namely Parkinson's, diabetes, and lung cancer, in addition to the Cleveland Heart Disease dataset. However, the main focus of this study is on predicting heart disease.
  • Analyzing the Health Data: An Application of High Utility Itemset Mining

    Padmavathi K., Saleti S., Tottempudi S.S.

    Conference paper, 2023 International Conference on Advances in Computation, Communication and Information Technology, ICAICCIT 2023, 2023, DOI Link

    View abstract ⏷

    A data science endeavour called "high utility pattern mining"entails finding important patterns based on different factors like profit, frequency, and weight. High utility itemsets are among the various patterns that have undergone thorough study. These itemsets must exceed a minimum threshold specified by the user. This is particularly useful in practical applications like retail marketing and web services, where items have diverse characteristics. High-utility itemset mining facilitates decision- making by uncovering patterns that have a significant impact. Unlike frequent itemset mining, which identifies commonly oc- curring itemsets, high-utility itemsets often include rare items in real-world applications. Considering the application to the medical field, data mining has been employed in various ways. In this context, the primary method involves analyzing a health dataset that spans from 2014 to 2017 in the United States. The dataset includes categories such as diseases, states, and deaths. By examining these categories and mortality rates, we can derive high-utility itemsets that reveal the causes of the most deaths. In conclusion, high-utility pattern mining is a data science activity that concentrates on spotting significant patterns based on objective standards. It has proven valuable in various fields, including the medical domain, where analyzing datasets can uncover high-utility itemsets related to mortality rates and causes of death.Categories and Subject Descriptors - [Health Database Applica- tion] Data Mining
  • A Comparative Analysis of the Evolution of DNA Sequencing Techniques along with the Accuracy Prediction of a Sample DNA Sequence Dataset using Machine Learning

    Mohammed K.B., Boyapati S.V., Kandimalla M.D., Kavati M.B., Saleti S.

    Conference paper, 2023 2nd International Conference on Paradigm Shifts in Communications Embedded Systems, Machine Learning and Signal Processing, PCEMS 2023, 2023, DOI Link

    View abstract ⏷

    DNA is widely considered the blueprint of life. The instructions required for all life forms, to evolve, breed, and thrive are found in DNA. Deoxyribonucleic acid, more commonly known as DNA, is among the most essential chemicals in living cells. A biological macro-molecule is DNA, also known as deoxyri-bonucleic acid. Life's blueprint is encoded by it. Sequencing of DNA has exponentially progressed due to the immense increase in data production in today's world. By means of this paper, we intend to evaluate the evolution of DNA Sequencing methods and perform a comparative analysis of modern-day DNA sequencing techniques to the ones of the past. We also illuminate the potential of machine learning in this domain by taking an exploratory and predicting the DNA Sequence using a Multinomial Naive Bayes classifier.
  • A Comparison of Various Class Balancing and Dimensionality Reduction Techniques on Customer Churn Prediction

    Bhushan Mada S.P., Thimmireddygari N., Tata R., Thondapu S.R., Saleti S.

    Conference paper, 7th IEEE International Conference on Recent Advances and Innovations in Engineering, ICRAIE 2022 - Proceedings, 2022, DOI Link

    View abstract ⏷

    With the advancement of technology, companies are able to foresee the customers who are going to leave their organization much before. This problem of customer churn prediction is handled in the current work. In the real-world, data is not balanced, having more observations with respect to few classes and less observations in case of other classes. But, giving equal importance to each class is really significant to build an efficient prediction model. Moreover, real-world data contains many attributes meaning that the dimensionality is high. In the current paper, we discussed three data balancing techniques and two methods of dimensionality reduction i.e. feature selection and feature extraction. Further, selecting the best machine learning model for churn prediction is an important issue. This has been dealt in the current paper. Also, we aim to improve the efficiency of customer churn prediction by evaluating various class balancing and dimensionality reduction techniques. Moreover, we evaluated the performance of the models using AUC curves and K-fold cross-validation techniques.
  • Ontology Based Food Recommendation

    Chivukula R., Lakshmi T.J., Sumalatha S., Reddy K.L.R.

    Conference paper, Smart Innovation, Systems and Technologies, 2022, DOI Link

    View abstract ⏷

    Eating right is the most crucial aspect of healthy living. A nutritious, balanced diet keeps our bodies support fight off diseases. Many lifestyle related diseases such as diabetes and thyroid can often be avoided by active living and better nutrition. Having diet related knowledge is essential for all. With this motivation, an ontology related to food domain is discussed and developed in this work. The aim of this work is to create on ontology model in the food domain to help people in getting right recommendation about the food, based on their health conditions if any.
  • Mining High Utility Time Interval Sequences Using MapReduce Approach: Multiple Utility Framework

    Saleti S., Lakshmi T.J., Ahmad M.W.

    Article, IEEE Access, 2022, DOI Link

    View abstract ⏷

    Mining high utility sequential patterns is observed to be a significant research in data mining. Several methods mine the sequential patterns while taking utility values into consideration. The patterns of this type can determine the order in which items were purchased, but not the time interval between them. The time interval among items is important for predicting the most useful real-world circumstances, including retail market basket data analysis, stock market fluctuations, DNA sequence analysis, and so on. There are a very few algorithms for mining sequential patterns those consider both the utility and time interval. However, they assume the same threshold for each item, maintaining the same unit profit. Moreover, with the rapid growth in data, the traditional algorithms cannot handle the big data and are not scalable. To handle this problem, we propose a distributed three phase MapReduce framework that considers multiple utilities and suitable for handling big data. The time constraints are pushed into the algorithm instead of pre-defined intervals. Also, the proposed upper bound minimizes the number of candidate patterns during the mining process. The approach has been tested and the experimental results show its efficiency in terms of run time, memory utilization, and scalability.
  • Incremental mining of high utility sequential patterns using MapReduce paradigm

    Saleti S.

    Article, Cluster Computing, 2022, DOI Link

    View abstract ⏷

    High utility sequential pattern (HUSP) mining considers the nonbinary frequency values of items purchased in a transaction and the utility of each item. Incremental updates are very common in many real-world applications. Mining the high utility sequences by rerunning the algorithm every time when the data grows is not a simple task. Moreover, the centralized algorithms for mining HUSPs incrementally cannot handle big data. Hence, an incremental algorithm for high utility sequential pattern mining using MapReduce para-digm (MR-INCHUSP) has been introduced in this paper. The proposed algorithm includes the backward mining strategy that profoundly handles the knowledge acquired from the past mining results. Further, elicited from the co-occurrence relation between the items, novel sequence extension rules have been introduced to increase the speed of the mining process. The experimental results exhibit the performance of MR-INCHUSP on several real and synthetic datasets.
  • Constraint Pushing Multi-threshold Framework for High Utility Time Interval Sequential Pattern Mining

    Saleti S., Naga Sahithya N., Rasagna K., Hemalatha K., SaiCharan B., Karthik Upendra P.V.

    Conference paper, Communications in Computer and Information Science, 2022, DOI Link

    View abstract ⏷

    This paper aims to detect high utility sequential patterns including time intervals and multiple utility thresholds. There are many algorithms that mine sequential patterns considering utility factor, these can find the order between the items purchased but they exclude the time interval among items. Further, they consider only the same utility threshold for each item present in the dataset, which is not convincing to assign equal importance for all the items. Time interval of items plays a vital role to forecast the most valuable real-world situations like retail sector, market basket data analysis etc. Recently, UIPrefixSpan algorithm has been introduced to mine the sequential patterns including utility and time intervals. Nevertheless, it considers only a single minimum utility threshold assuming the same unit profit for each item. Hence, to solve the aforementioned issues, in the current work, we proposed UIPrefixSpan-MMU algorithm by utilizing a pattern growth approach and four time constraints. The experiments done on real datasets prove that UIPrefixSpan-MMU is more efficient and linearly scalable for generating the time interval sequences with high utility.
  • Exploring Patterns and Correlations Between Cryptocurrencies and Forecasting Crypto Prices Using Influential Tweets

    Kumar M., Priya G.S., Gadipudi P., Agarwal I., Sumalatha S.

    Conference paper, Communications in Computer and Information Science, 2022, DOI Link

    View abstract ⏷

    The Crypto market, as we know, is a market full of various kinds of investors and influencers. We all know the pizza incident in 2010 where a guy purchased two pizzas at 10000 BTC, which ranges nearly around 80 million in current times. That describes how much the market has progressed in these 10–12 years. You can see drastic changes in the price of several coins in the past few years, which brings in many new investors to invest their money in this market. Crypto Market has highly volatile currencies. Bitcoin was around 5K INR in 2013, and by year 2021, it reached 48 Lakhs INR, which shows how volatile the market is. The dataset provides many fascinating and valuable insights that help us gather practical knowledge. As data scientists, we are very keen to understand such a market whose data is unstable and keeps changing frequently and making out new patterns with time. This introduction of new patterns with time makes this problem an interesting one and keeps on motivating us to find some valuable information. So, through this manuscript, we tried to analyze two specific crypto coins for a particular period, including more than 2900 records. We found several interesting patterns in the dataset and explored the historical return using several statistical models. We plotted the opening and closing prices of the particular coin by using NumPy, SciPy, and Matplotlib. We also tried to make predictions of the cost of the specific currency and then plot the predicted price line with the actual price line and understand the difference in the prediction model with the fundamental price mode. To do so, we used the Simple Exponential Smoothing (SES) model and performed sentiment analysis based on influencing tweets on Twitter. That makes our prediction more accurate and more reliable than existing techniques. Lastly, we used a linear regression model to establish the relationship between the returns of Ripple and Bitcoin.
  • Mining Spatio-Temporal Sequential Patterns Using MapReduce Approach

    Saleti S., RadhaKrishna P., JaswanthReddy D.

    Conference paper, Communications in Computer and Information Science, 2022, DOI Link

    View abstract ⏷

    Spatio-temporal sequential pattern mining (STSPM) plays an important role in many applications such as mobile health, criminology, social media, solar events, transportation, etc. Most of the current studies assume the data is located in a centralized database on which a single machine performs mining. Thus, the existing centralized algorithms are not suitable for the big data environment, where data cannot be handled by a single machine. In this paper, our main aim is to find out the Spatio-temporal sequential patterns from the event data set using a distributed framework suitable for mining big data. We proposed two distributed algorithms, namely, MR-STBFM (MapReduce based spatio-temporal breadth first miner), and MR-SPTreeSTBFM (MapReduce based sequential pattern tree spatio-temporal breadth first miner). These are the distributed algorithms for mining spatio-temporal sequential patterns using Hadoop MapReduce framework. A spatio-temporal tree structure is used in MR-SPTreeSTBFM for reducing the candidate generation cost. This is an extension to the proposed MR-STBFM algorithm. The tree structure significantly improves the performance of the proposed approach. Also, the top-most significant pattern approach has been proposed to mine the top-most significant sequential patterns. Experiments are conducted to evaluate the performance of the proposed algorithms on the Boston crime dataset.
  • Student Placement Chance Prediction Model using Machine Learning Techniques

    Manike M., Singh P., Madala P.S., Varghese S.A., Sumalatha S.

    Conference paper, 2021 5th Conference on Information and Communication Technology, CICT 2021, 2021, DOI Link

    View abstract ⏷

    Obtaining employment upon graduation from uni-versity is one of the highest, if not the highest, priorities for students and young adults. Developing a system that can help these individuals obtain placement advice, analyze labor market trends, and assist educational institutions in assessing growing fields and opportunities would serve immense value. With the emergence of heavily refined Data Mining techniques and Machine Learning boiler plates, a model based on predictive analysis can help estimate a variety of realistic and possible placement metrics, such as the types of companies a junior year student can be placed in, or the companies that are likely to look for the specific skill sets of a student. Various attributes such as academic results, technical skills, training experiences, and projects can help predict purposes. We devised the XGBoost Technique, a structured or tabular data-focused approach that has recently dominated applied machine learning and Kaggle tournaments. XGBoost is a high-speed and high-performance implementation of gradient boosted decision trees. We created a model and ran numerous EDAs to determine whether the student will be placed or not, as well as in which type of organization he will be placed [Day Sharing, Dream, Super Dream, Marquee].
  • Distributed Mining of High Utility Sequential Patterns with Negative Item Values

    Varma M., Sumalatha S., reddy A.

    Article, International Journal of Advanced Computer Science and Applications, 2021, DOI Link

    View abstract ⏷

    The sequential pattern mining was widely used to solve various business problems, including frequent user click pattern, customer analysis of buying product, gene microarray data analysis, etc. Many studies were going on these pattern mining to extract insightful data. All the studies were mostly concentrated on high utility sequential pattern mining (HUSP) with positive values without a distributed approach. All the existing solutions are centralized which incurs greater computation and communication costs. In this paper, we introduce a novel algorithm for mining HUSPs including negative item values in support of a distributed approach. We use the Hadoop map reduce algorithms for processing the data in parallel. Various pruning techniques have been proposed to minimize the search space in a distributed environment, thus reducing the expense of processing. To our understanding, no algorithm was proposed to mine High Utility Sequential Patterns with negative item values in a distributed environment. So, we design a novel algorithm called DHUSP-N (Distributed High Utility Sequential Pattern mining with Negative values). DHUSP-N can mine high utility sequential patterns considering the negative item utilities from Bigdata.
  • Distributed Mining of High Utility Time Interval Sequential Patterns with Multiple Minimum Utility Thresholds

    Saleti S., Tangirala J.L., Thirumalaisamy R.

    Conference paper, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2021, DOI Link

    View abstract ⏷

    In this paper, the problem of mining high utility time interval sequential patterns with multiple utility thresholds in a distributed environment is considered. Mining high utility sequential patterns (HUSP) is an emerging issue and the existing HUSP algorithms can mine the order of items and they do not consider the time interval between the successive items. In real-world applications, time interval patterns provide more useful information than the conventional HUSPs. Recently, we proposed distributed high utility time interval sequential pattern mining (DHUTISP) algorithm using MapReduce in support of the BigData environment. The algorithm has been designed considering a single minimum utility threshold. It is not convincing to use the same utility threshold for all the items in the sequence, which means that all the items are given the same importance. Hence, in this paper, a new distributed framework is proposed to efficiently mine high utility time interval sequential patterns with multiple minimum utility thresholds (DHUTISP-MMU) using the MapReduce approach. The experimental results show that the proposed approach can efficiently mine HUTISPs with multiple minimum utility thresholds.
  • Distributed mining of high utility time interval sequential patterns using mapreduce approach

    Sumalatha S., Subramanyam R.B.V.

    Article, Expert Systems with Applications, 2020, DOI Link

    View abstract ⏷

    High Utility Sequential Pattern mining (HUSP) algorithms aim to find all the high utility sequences from a sequence database. Due to the large explosion of data, recently few distributed algorithms have been designed for mining HUSPs based on the MapReduce framework. However, the existing HUSP algorithms such as USpan, HUS-Span and BigHUSP are able to predict only the order of items, they do not predict the time between the items, that is, they do not include the time intervals between the successive items. But in a real-world scenario, time interval patterns provide more valuable information than conventional high utility sequential patterns. Therefore, we propose a distributed high utility time interval sequential pattern mining (DHUTISP) algorithm using the MapReduce approach that is suitable for big data. DHUTISP creates a novel time interval utility linked list data structure (TIUL) to efficiently calculate the utility of the resulting patterns. Moreover, two utility upper bounds, namely, remaining utility upper bound (RUUB) and co-occurrence utility upper bound (CUUB) are proposed to prune the unpromising candidates. We conducted various experiments to prove the efficiency of the proposed algorithm over both the distributed and non-distributed approaches. The experimental results show the efficiency of DHUTISP over state-of-the-art algorithms, namely, BigHUSP, AHUS-P, PUSOM and UTMining_A.
  • A MapReduce solution for incremental mining of sequential patterns from big data

    Saleti S., R.B.V. S.

    Article, Expert Systems with Applications, 2019, DOI Link

    View abstract ⏷

    Sequential Pattern Mining (SPM) is a popular data mining task with broad applications. With the advent of big data, traditional SPM algorithms are not scalable. Hence, many of the researchers have migrated to big data frameworks such as MapReduce and proposed distributed algorithms. However, the existing MapReduce algorithms assume the data as static and do not handle the incremental database updates. Moreover, they use to re-mine the updated database while new sequences are inserted. In this paper, we propose an efficient distributed algorithm for incremental sequential pattern mining (MR-INCSPM) using the MapReduce framework that can handle big data. The proposed algorithm incorporates the backward mining approach that efficiently makes use of the knowledge obtained during the previous mining process. Also, based on the study of item co-occurrences, we propose Co-occurrence Reverse Map (CRMAP) data structure. The issue of combinatorial explosion of candidate sequences is dealt using the proposed CRMAP data structure. Besides, a novel candidate generation and early prune mechanisms are designed using CRMAP to speed up the mining process. The proposed algorithm is evaluated on both the real and synthetic datasets. The experimental results prove the efficacy of MR-INCSPM with respect to processing time, memory and pruning efficiency.
  • A novel mapreduce algorithm for distributed mining of sequential patterns using co-occurrence information

    Saleti S., Subramanyam R.B.V.

    Article, Applied Intelligence, 2019, DOI Link

    View abstract ⏷

    Sequential Pattern Mining (SPM) problem is much studied and extended in several directions. With the tremendous growth in the size of datasets, traditional algorithms are not scalable. In order to solve the scalability issue, recently few researchers have developed distributed algorithms based on MapReduce. However, the existing MapReduce algorithms require multiple rounds of MapReduce, which increases communication and scheduling overhead. Also, they do not address the issue of handling long sequences. They generate huge number of candidate sequences that do not appear in the input database and increases the search space. This results in more number of candidate sequences for support counting. Our algorithm is a two phase MapReduce algorithm that generates the promising candidate sequences using the pruning strategies. It also reduces the search space and thus the support computation is effective. We make use of the item co-occurrence information and the proposed Sequence Index List (SIL) data structure helps in computing the support at fast. The experimental results show that the proposed algorithm has better performance over the existing MapReduce algorithms for the SPM problem.
  • A novel bit vector product algorithm for mining frequent itemsets from large datasets using mapreduce framework

    Saleti S., Subramanyam R.B.V.

    Article, Cluster Computing, 2018, DOI Link

    View abstract ⏷

    Frequent itemset mining (FIM) is an interesting sub-area of research in the field of Data Mining. With the increase in the size of datasets, conventional FIM algorithms are not suitable and efforts are made to migrate to the Big Data Frameworks for designing algorithms using MapReduce like computing paradigms.We too interested in designing MapReduce based algorithm. Initially, our Parallel Compression algorithm makes data simpler to handle. A novel bit vector data structure is proposed to maintain compressed transactions and it is formed by scanning the dataset only once. Our Bit Vector Product algorithm follows the MapReduce approach and effectively searches for frequent itemsets from a given list of transactions. The experimental results are present to prove the efficacy of our approach over some of the recent works.

Patents

  • A system and method for detection and mitigation of cyber threats in social networking platforms

    Dr M Krishna Siva Prasad, Dr Elakkiya E, Dr Saleti Sumalatha

    Patent Application No: 202441036235, Date Filed: 07/05/2024, Date Published: 17/05/2024, Status: Published

  • A system and a method for an essay grading system

    Dr Saleti Sumalatha

    Patent Application No: 202241043045, Date Filed: 27/07/2022, Date Published: 19/08/2022, Status: Granted

  • System and Method for Mining of Constraint Based High Utility Time Interval Sequential Patterns

    Dr Saleti Sumalatha

    Patent Application No: 202241044001, Date Filed: 01/08/2022, Date Published: 19/08/2022, Status: Published

  • A system and a method for privacy-preserving disease prediction using a federated learning technique

    Dr Saleti Sumalatha

    Patent Application No: 202341076138, Date Filed: 08/11/2023, Date Published: 15/12/2023, Status: Published

  • Systema and method for predicting customer churn using random leaf model

    Dr Saleti Sumalatha

    Patent Application No: 202441036236, Date Filed: 07/05/2024, Date Published: 17/05/2024, Status: Published

  • An error detection system for manufacturing process and a method thereof

    Dr Saleti Sumalatha

    Patent Application No: 202441069636, Date Filed: 14/09/2024, Date Published: 15/11/2024, Status: Published

  • A method and system for disease prediction using machine learning models during medical diagnoses of patients

    Dr Saleti Sumalatha

    Patent Application No: 202441032199, Date Filed: 23/04/2024, Date Published: 26/04/2024, Status: Published

Projects

Scholars

Post- Doctoral Scholars

  • Dr Mohamad Mulham Belal

Doctoral Scholars

  • Mr Ramdas Kapila
  • Ms A Sai Sunanda

Interests

  • Artificial Intelligence
  • Data Science
  • Distributed Computing
  • Machine Learning

Thought Leaderships

There are no Thought Leaderships associated with this faculty.

Top Achievements

Research Area

No research areas found for this faculty.

Computer Science and Engineering is a fast-evolving discipline and this is an exciting time to become a Computer Scientist!

Computer Science and Engineering is a fast-evolving discipline and this is an exciting time to become a Computer Scientist!

Education
2004
B.Tech
Narayana Engineering College, JNTU Hyderabad
India
2010
M.Tech
Annamacharya Institute of science and technology, JNTU Anantapur
India
2020
Ph.D.
National institute of Technology, Warangal
India
Experience
  • January 2019 to March 2020 - Associate Professor – Geethanjali Institute of Science and Technology, Nellore, Andhra Pradesh.
  • July 2015 to December 2018 – Research Scholar – National Institute of Technology, Warangal, Telangana.
  • May 2012 to May 2015 – Associate Professor – Geethanjali Institute of Science and Technology, Nellore, Andhra Pradesh.
  • October 2010 to April 2012 – Assistant Professor - Narayana Engineering College, Nellore, Andhra Pradesh.
  • July 2004 to August 2008 – Assistant Professor – Narayana Engineering College, Nellore, Andhra Pradesh.
Research Interests
  • To implement a learning management system and study the navigational patterns to enhance students learning.
  • To develop incremental mining algorithms.
Awards & Fellowships
  • 2022 - Undergraduate students under my supervision received one gold medal and one silver medal (September 2022) for paper presentation in Research Day organized by SRM University, AP, India.
  • 2022 - Undergraduate students under my supervision received a gold medal (January 2022) for paper presentation in Research Day organized by SRM University, AP, India.
  • 2015 to 2018 - Ph.D. Fellowship - Ministry of Human Resource Development
  • 2010 – Secured first rank in Master’s Degree.
Memberships
  • Life Member of ISTE
Publications
  • A fully decentralized federated adversarial vision transformer with blockchain and secure aggregation for visual-based intrusion and malware forensics

    Shiva, Fazad

    Journal, International Journal of Data science and analytics, 2026, Quartile: Q2DOI Link

    View abstract ⏷

    This paper presents a fully decentralized federated adversarial vision transformer (ViT) framework for secure, privacy-preserving, and robust image-based malware classification. Unlike conventional federated learning that relies on centralized aggregation and remains vulnerable to privacy breaches and adversarial attacks, the proposed system employs blockchain- based decentralized aggregation integrated with secure multi-party computation. Encrypted local model updates are securely aggregated without a central server, while the blockchain ledger ensures transparency, tamper resistance, and trust. To further enhance security, a zero-knowledge proof-based mechanism validates masked model updates, enabling verifiable aggregation without exposing raw parameters. Clients reconstruct the global model through decentralized consensus, preventing direct access to others’ updates. Adversarial robustness is improved via client-side adversarial ViT training, incorporating projected gradient descent-generated malware images with clean samples, thereby reducing false classifications. Computational efficiency is achieved by leveraging pre-trained ViT variants for resource-constrained environments. Extensive experiments on Malimg, Microsoft BIG 2015, and Malevis datasets demonstrate superior performance, achieving accuracies of 98.30%, 98.93%, and 95.72%, respectively. Compared to centralized and federated adversarial ViTs, as well as state-of-the-art methods (FASe-ViT, FASNet, DM-Mal, Fed-Mal), the proposed framework consistently achieves higher accuracy, precision, recall, and F1-scores, while ensuring privacy, resilience, and decentralized trust.
  • Dynamic RBFN with vector attention-guided feature selection for spam detection in social media

    Elakkiya E., Saleti S., Balakrishnan A.

    Article, Complex and Intelligent Systems, 2026, DOI Link

    View abstract ⏷

    Online social media platforms have emerged as primary engagement channels for internet users, leading to increased dependency on social network information. This growing reliance has attracted cybercriminals, resulting in a surge of malicious activities such as spam. Consequently, there is a pressing need for efficient spam detection mechanisms. Although several techniques have been proposed for social network spam detection, spammers continually evolve their strategies to bypass these systems. In response, researchers have focused on extracting additional features to better identify spammer patterns. However, this often introduces feature redundancy and complexity, which traditional machine learning-based feature selection methods struggle to manage in highly complex datasets. To address this, we propose a novel attention network-based feature selection method that assigns weights to features based on their importance, reducing redundancy while retaining relevant information. Additionally, an adaptive Radial Basis Function Neural Network (RBFN) is employed for spam classification, enabling dynamic weight updates to reflect evolving spam behaviors. The proposed method is evaluated against state-of-the-art feature selection, deep learning models, and existing spam detection techniques using accuracy, F-measure, and false-positive rate. Experimental results demonstrate that our approach outperforms existing methods, offering superior performance in detecting spam on social networks.
  • Federated learning-based disease prediction: A fusion approach with feature selection and extraction

    Kapila R., Saleti S.

    Article, Biomedical Signal Processing and Control, 2025, DOI Link

    View abstract ⏷

    The ability to predict diseases is critical in healthcare for early intervention and better patient outcomes. Data security and privacy significantly classified medical data from several institutions is analyzed. Cooperative model training provided by Federated Learning (FL), preserves data privacy. In this study, we offer a fusion strategy for illness prediction, combining FL with Anova and Chi-Square Feature Selection (FS) and Linear Discriminate Analysis (LDA) Feature Extraction (FE) techniques. This research aims to use FS and FE techniques to improve prediction performance while using the beneficial aspects of FL. A comprehensive analysis of the distributed data is ensured by updating aggregate models with information from all participating institutions. Through collaboration, a robust disease prediction system excels in the limited possibilities of individual datasets. We assessed the fusion strategy on the Cleveland heart disease and diabetes datasets from the UCI repository. Comparing the fusion strategy to solo FL or conventional ML techniques, the prediction performance a unique fusion methodology for disease prediction. Our proposed models, Chi-Square with LDA and Anova with LDA leveraging FL, exhibited exceptional performance on the diabetes dataset, achieving identical accuracy, precision, recall, and f1-score of 92.3%, 94.36%, 94.36, and 94.36%, respectively. Similarly, on the Cleveland heart disease dataset, these models demonstrated significant performance, achieving accuracy, precision, recall, and f1-score of 88.52%, 87.87%, 90.62, and 89.23%, respectively. The results have the potential to revolutionize disease prediction, maintain privacy, advance healthcare, and outperform state-of-the-art models.
  • Optimizing Deep Learning for Pneumonia Diagnosis Using Chest X-Ray Data

    Kapila R., Sunanda A.S., Saleti S., Elakkiya E.

    Book chapter, Sensor Data Analytics for Intelligent Healthcare Delivery, 2025, DOI Link

  • Enhancing Disease Prediction with Correctness-Driven Ensemble Models

    Kapila R., Saleti S.

    Article, SN Computer Science, 2025, DOI Link

    View abstract ⏷

    Heart disease is the most dangerous and hazardous one. Human lives can be spared if the disease is diagnosed early enough and treated properly. We propose an efficient ensemble model which classifies all records correctly on the benchmark datasets. The correctness is accomplished by using Anova-Principal Component Analysis (Anv-PCA) techniques with a Stacking Classifier (SC) to select and extract the best features. The most significant component to evaluate in the medical area is recall. The findings show that the proposed Anv-PCA with SC meets all of the correctness requirements in concepts of accuracy, precision, recall, and f1-score with the highest results compared with the existing approaches. Anv-PCA, a method for selecting and extracting features, is paired with an ensemble classification algorithm in the approach we propose, which makes use of the Cleveland heart disease UCI dataset. All patient records are correctly categorized using this method, fulfilling the required criteria for correctness. The proposed model is also validated on other six publicly available benchmark datasets for diabetes, cardiovascular, Framingham, CBC, COVID-specific and HD (comprehensive) datasets available in the UCI repository, which presently meets the correctness requirements. The proposed approach exceeds all cutting-edge models.
  • Insights into Gun-Related Deaths: A Comprehensive Machine Learning Analysis

    Panchumarthi L.Y., Parchuri L., Saleti S.

    Conference paper, 2024 15th International Conference on Computing Communication and Networking Technologies, ICCCNT 2024, 2024, DOI Link

    View abstract ⏷

    This work employs both supervised and unsupervised machine learning techniques to examine firearm-related fatalities in the US and identify trends, patterns, and risk factors within the data. During the supervised learning phase, techniques such as logistic regression, decision trees, random forests, and neural networks were used to predict the kind of death (suicide, homicide, accidental, or unknown) based on demographic data like sex, age, race, place, and education. Findings show that the neural network and random forest models exhibit promising precision and recall values across several classes, and that they obtained the highest accuracy, reaching 79.88% and 83.59%, respectively. Using clustering techniques including Agglomerative clustering, K-means, and Gaussian mixture models, gun-related fatalities were categorized based on demographic and temporal data during the unsupervised learning stage. The analysis revealed distinct clusters of deaths, providing insights into the varying patterns and trends over time and across demographic groups. The K-means algorithm, with a silhouette score of 0.42, demonstrated meaningful separation among clusters. The research contributes to understanding the complex dynamics of gun-related deaths, shedding light on both individual risk factors and broader trends. However, further analysis could explore additional dimensions of the dataset or delve deeper into the interpretation of clustering results. The study also highlights how crucial it is to take into consideration the moral consequences and constraints of machine learning applications in complex fields like public health.
  • Enhancing Forecasting Accuracy with a Moving Average-Integrated Hybrid ARIMA-LSTM Model

    Saleti S., Panchumarthi L.Y., Kallam Y.R., Parchuri L., Jitte S.

    Article, SN Computer Science, 2024, DOI Link

    View abstract ⏷

    This research provides a time series forecasting model that is hybrid which combines Long Short-Term Memory (LSTM) and Autoregressive Integrated Moving Average (ARIMA) models with moving averages. For modelling stationary time series, LSTM models are utilized, while modelling non-stationary time series is done using ARIMA models. While LSTM models are more suited for capturing long-term dependencies, ARIMA models are superior in catching short-term relationships in time series data. The hybrid model combines the short-term dependency modeling of ARIMA utilising LSTM’s long-term dependency modelling. This combination leads to greater accuracy predictions for time series data that are both stationary and non-stationary. Also, Triple Exponential Moving Average (TEMA), Weighted Moving Average (WMA), Simple Moving Average (SMA), and six other moving averages were examined to determine how well the hybrid model performed. Kaufman Adaptive Moving Average (KAMA), MIDPOINT, MIDPRICE individually helps to know which methods give much precision. The study compares the hybrid model’s predicting performance to that of standalone ARIMA and LSTM models, in addition to other prominent forecasting approaches like linear regression and random forest. The findings indicate that the hybrid model surpasses the individual models and other benchmark methods, achieving increased precision in terms of Mean absolute percentage error (MAPE) and Root mean squared error (RMSE). The research also investigates the impact of different hyperparameters and model configurations on performance forecasts, giving information about the ideal settings for the hybrid model. Overall, the proposed ARIMA-LSTM hybrid model with moving averages is a promising approach for accurate and reliable stock price forecasting, which has practical implications for financial decision-making and risk management.
  • Comparative Analysis of Optimization Algorithms for Feature Selection in Heart Disease Classification

    Kapila R., Saleti S.

    Conference paper, Lecture Notes in Networks and Systems, 2024, DOI Link

    View abstract ⏷

    In order to classify the Cleveland Heart Disease dataset, this study evaluates the performance of three optimization methods, namely Fruit Fly Optimization (FFO), Particle Swarm Optimization (PSO), and Grey Wolf Optimizer (GWO). The top 10 features are identified using FFO, with remarkable results for accuracy (88.52%), precision (87.88%), recall (90.63%), and f1-score (89.23%). After using PSO, the accuracy, precision, recall, specificity, and f1-score are all 85.25%, 87.10%, 84.38%, and 85.71% respectively. Finally, GWO is used, which results in precision, accuracy, recall, and f1-score values of 93.33%, 90.16%, 90.11%, 87.5%, and 90.32% respectively, highlighting its consistent superior performance. FFO shows competitive outcomes with notable accuracy and recall through a comparative examination. PSO displays comparable precision and recall while displaying a somewhat poorer accuracy. In contrast, GWO performs better than both FFO and PSO, displaying great accuracy and precision along with remarkable recall and specificity. These results provide important information on the effectiveness of feature selection methods utilized in optimization algorithms for heart disease classification. The study also highlights the need for more investigation into the potential of these optimization algorithms in many fields, broadening their use beyond the classification of disease. This kind of work might improve the progress of the field of feature selection and aid in the creation of better classification models.
  • Addressing Nutrition Gaps: An In-depth Study of Child Undernutrition

    Panchumarthi L.Y., Manike M., Saleti S., Singh P., Madala P.S., Varghese S.A.

    Conference paper, Proceedings - 2024 OITS International Conference on Information Technology, OCIT 2024, 2024, DOI Link

    View abstract ⏷

    Malnutrition among children poses a significant health issue, leading to both immediate and long-term adverse health consequences. Shockingly, around two out of every three children in India, residing in both rural and urban areas, experience undernourishment. Moreover, the future behavioral and psychological well-being of children is profoundly affected by the consequences of insufficient nutrient intake. The key observations during a child’s growth, focusing on Wasting, Stunting, and Underweight, highlight the severity of the problem. This study utilizes data from the government initiatives E-Sadhana and Anganwadi. By employing data mining and econometric techniques, the paper aims to elucidate the causal factors and provide a visual representation of the disparities in under-nutrition across the state, considering economic and social factors.
  • Comparative Study of Melanoma Disease Classification using Deep Learning

    Panchumarthi L.Y., Gogineni A.K., Saleti S., Parchuri L., Kakarala H.

    Conference paper, Proceedings - 2024 OITS International Conference on Information Technology, OCIT 2024, 2024, DOI Link

    View abstract ⏷

    Melanoma is one of the deadliest and fastest-growing diseases on the globe, taking many lives every year. The early identification of melanoma through dermoscopy images can notably enhance the chances of survival. However, due to factors including the absence of contrast between the lesions and the skin and the visual similarity between melanoma and nonmalignant lesions, reliable melanoma differentiation is extremely difficult. Therefore, the accuracy and productivity of pathologists can be significantly increased by implementing a trustworthy automated method for the detection of skin tumours. This study introduces a method that employs deep learning models for cancer detection. Furthermore, we evaluate and analyze the following six deep learning approaches: Inception-ResNetV2, CNN, VGG16, EfficientNet, Densenet201, and MobileNetV2. Two different datasets, ISIC and MNIST, were used to evaluate the suggested deep learning frameworks. The experimental results demonstrate the promising accuracy of our frameworks. This survey highlights significant datasets, benchmark challenges, and evaluation metrics related to skin lesion analysis, offering a thorough overview of the field.
  • Enhancing the Accuracy of Manufacturing Process Error Detection Through SMOTE-Based Oversampling Using Machine Learning and Deep Learning Techniques

    Boyapati S.V., Rakshitha G.B.S., Reddy M.R., Saleti S.

    Conference paper, International Conference on Integrated Circuits, Communication, and Computing Systems, ICIC3S 2024 - Proceedings, 2024, DOI Link

    View abstract ⏷

    A production competency study leads to a rise in the manufacturing sectors' strategic emphasis. Developing semiconductor materials is a highly complex approach that necessitates numerous evaluations. It is impossible to emphasize the significance of the quality of the product. We put up a number of methods for automatically creating a prognostic model that is effective at identifying equipment flaws throughout the semiconductor materials' wafer fabrication process. The SECOM dataset is representative of semiconductor production procedures that go through numerous tests performed. There are imbalanced statistics in the dataset, so our proposed methodology incorporates SMOTE (Synthetic Minority Over-sampling Technique) functionality that is introduced to mitigate the imbalance of the training dataset by leveling off any unbalanced attributes. Detecting faults in the manufacturing process improves semiconductor quality and testing efficiency, and is used to validate both approaches to Machine Learning and Deep Learning algorithms. This is accomplished by collecting performance metrics during the development process. Another aspect of our effort to cut down on the training time for testing is highlighted in our research report.
  • A Survey on Occupancy-Based Pattern Mining

    Inaganti B., Saleti S.

    Conference paper, Lecture Notes in Networks and Systems, 2024, DOI Link

    View abstract ⏷

    Occupancy-based pattern mining has emerged as a significant research topic in recent times. This paper presents a comprehensive survey on occupancy, which serves as a measure to augment the significance of patterns. The survey covers various types of patterns, including frequent itemsets, high utility itemsets, frequent sequences, and high utility sequences, all in the context of occupancy. Additionally, the paper delves into techniques aimed at reducing the search space in the aforementioned pattern mining problems. These techniques are crucial for improving the efficiency and scalability of the mining process, especially when dealing with large-scale datasets. Furthermore, the paper discusses potential research extensions for occupancy-based pattern mining. These extensions could explore new applications, explore novel algorithms, or further enhance the effectiveness of occupancy as a measure for pattern evaluation. In general, this survey provides an important resource for researchers interested in understanding and advancing occupancy-based pattern mining techniques.
  • Leveraging ResNet for Efficient ECG Heartbeat Classification

    Panchumarthi L.Y., Padmanabhuni S., Saleti S.

    Conference paper, Proceedings - 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems: Harmonizing Signals, Data, and Energy: Bridging the Digital Future, SPICES 2024, 2024, DOI Link

    View abstract ⏷

    This paper provides a novel approach that uses a modified version of the ResNet architecture to classify heartbeats on an electrocardiogram (ECG). Padding, convolution, max pooling, convolutional blocks, average pooling, and fully linked layers are the six processes in the approach. The MIT-BIH Arrhythmia Database is used to test the approach on five different types of heartbeats: unclassifiable, supraventricular premature, premature ventricular contraction, fusion of ventricular and normal, and normal. The outcomes demonstrate that the suggested approach outperforms other current techniques like LSTM, CNN, and EfficientNet, achieving an accuracy of 98.6%. The performance, restrictions, and future directions of the model are also thoroughly examined in this work. The automated ECG heartbeat categorization using deep learning techniques is one way that the article advances the field of cardiac diagnosis.
  • Optimizing Recommendation Systems: Analyzing the Impact of Imputation Techniques on Individual and Group Recommendation Systems

    Bhushan Mada S.P., Tata R., Sree Reddy Thondapu S.T., Saleti S.

    Conference paper, Proceedings - 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems: Harmonizing Signals, Data, and Energy: Bridging the Digital Future, SPICES 2024, 2024, DOI Link

    View abstract ⏷

    In today's world, Recommendation Systems play a significant role in guiding and simplifying the decision-making process for individuals and groups. However, the presence of missing data in user-item interaction matrices poses a challenge to accurately identify user preferences and provide relevant suggestions. This is particularly true for group recommendation systems that cater to multiple users. To address this challenge, we have applied four imputation techniques to individual and group recommendation models, including User-based Collaborative filtering, Matrix factorization using Singular Value Decomposition, and deep learning-based models like Autoencoders. We evaluated the effectiveness of these techniques using root mean squared error and mean absolute error metrics and observed a significant impact on the quality of recommendations. Additionally, we implemented aggregation strategies like Borda count, Additive Utilitarian, Multiplicative Utilitarian, Least Misery, and Most Pleasure for Group Recommendations. We evaluated the performance of these strategies using satisfaction score and disagreement score. Overall, our findings suggest that imputation techniques can significantly improve the quality of recommendations in both individual and group recommendation systems.
  • Enhancing Customer Churn Prediction: Advanced Models and Resampling Techniques in Dynamic Business Environments

    Thotakura Y.C., Manikanta Yarramsetty D., Doppalapudi K.K., Shasank Alaparthi S., Saleti S.

    Conference paper, Intelligent Computing and Emerging Communication Technologies, ICEC 2024, 2024, DOI Link

    View abstract ⏷

    Customer churn analysis is critical for businesses looking to hold onto market share in today's dynamic business environment. The development of e-Finance presents additional difficulties for the traditional banking sector as the digital marketplace grows. Banks face several challenges, including fintech competition, dwindling client loyalty, and digital transformation. Bank managers can identify problems, identify potential churn customers early on, and develop effective retention strategies based on client traits and preferences by analyzing probable causes of bank customer turnover from multiple perspectives and building models for predicting churn. Not only banks but also large corporate sectors like telecommunication, and over-the-top (OTT) platforms do face customer churn. This study proposes the Random Leaf Model (RLM) and also explores the Logit Leaf Model (LLM), and Neural Network Ensemble Model, three sophisticated predictive modeling methodologies. Proactive strategies are necessary in today's marketplaces due to their competitive nature. The primary problem with current automatic churn prediction algorithms is the substantial gap between majority and minority class proportions in the datasets, which might lead to model bias in favor of the dominant class. The shortcomings of conventional churn analysis techniques underscore the necessity of implementing advanced cutting-edge algorithms to achieve precise forecasts.
  • A comparison of various machine learning algorithms and execution of flask deployment on essay grading

    Kotha U.M., Gaddam H., Siddenki D.R., Saleti S.

    Article, International Journal of Electrical and Computer Engineering, 2023, DOI Link

    View abstract ⏷

    Students’ performance can be assessed based on grading the answers written by the students during their examination. Currently, students are assessed manually by the teachers. This is a cumbersome task due to an increase in the student-teacher ratio. Moreover, due to coronavirus disease (COVID-19) pandemic, most of the educational institutions have adopted online teaching and assessment. To measure the learning ability of a student, we need to assess them. The current grading system works well for multiple choice questions, but there is no grading system for evaluating the essays. In this paper, we studied different machine learning and natural language processing techniques for automated essay scoring/grading (AES/G). Data imbalance is an issue which creates the problem in predicting the essay score due to uneven distribution of essay scores in the training data. We handled this issue using random over sampling technique which generates even distribution of essay scores. Also, we built a web application using flask and deployed the machine learning models. Subsequently, all the models have been evaluated using accuracy, precision, recall, and F1-score. It is found that random forest algorithm outperformed the other algorithms with an accuracy of 97.67%, precision of 97.62%, recall of 97.67%, and F1-score of 97.58%.
  • Heart Disease Prediction Using Novel Quine McCluskey Binary Classifier (QMBC)

    Kapila R., Ragunathan T., Saleti S., Lakshmi T.J., Ahmad M.W.

    Article, IEEE Access, 2023, DOI Link

    View abstract ⏷

    Cardiovascular disease is the primary reason for mortality worldwide, responsible for around a third of all deaths. To assist medical professionals in quickly identifying and diagnosing patients, numerous machine learning and data mining techniques are utilized to predict the disease. Many researchers have developed various models to boost the efficiency of these predictions. Feature selection and extraction techniques are utilized to remove unnecessary features from the dataset, thereby reducing computation time and increasing the efficiency of the models. In this study, we introduce a new ensemble Quine McCluskey Binary Classifier (QMBC) technique for identifying patients diagnosed with some form of heart disease and those who are not diagnosed. The QMBC model utilizes an ensemble of seven models, including logistic regression, decision tree, random forest, K-nearest neighbour, naive bayes, support vector machine, and multilayer perceptron, and performs exceptionally well on binary class datasets. We employ feature selection and feature extraction techniques to accelerate the prediction process. We utilize Chi-Square and ANOVA approaches to identify the top 10 features and create a subset of the dataset. We then apply Principal Component Analysis to the subset to identify 9 prime components. We utilize an ensemble of all seven models and the Quine McCluskey technique to obtain the Minimum Boolean expression for the target feature. The results of the seven models ( x_0, x_1, x_2,⋖, x_6 ) are considered independent features, while the target attribute is dependent. We combine the projected outcomes of the seven ML models and the target feature to form a foaming dataset. We apply the ensemble model to the dataset, utilizing the Quine McCluskey minimum Boolean equation built with an 80:20 train-to-test ratio. Our proposed QMBC model surpasses all current state-of-the-art models and previously suggested methods put forward by various researchers.
  • Optimizing fetal health prediction: Ensemble modeling with fusion of feature selection and extraction techniques for cardiotocography data

    Kapila R., Saleti S.

    Article, Computational Biology and Chemistry, 2023, DOI Link

    View abstract ⏷

    Cardiotocography (CTG) captured the fetal heart rate and the timing of uterine contractions. Throughout pregnancy, CTG intelligent categorization is crucial for monitoring fetal health and preserving proper fetal growth and development. Since CTG provides information on the fetal heartbeat and uterus contractions, which helps determine if the fetus is pathologic or not, obstetricians frequently use it to evaluate a child's physical health during pregnancy. In the past, obstetricians have artificially analyzed CTG data, which is time-consuming and inaccurate. So, developing a fetal health categorization model is crucial as it may help to speed up the diagnosis and treatment and conserve medical resources. The CTG dataset is used in this study. To diagnose the illness, 7 machine learning models are employed, as well as ensemble strategies including voting and stacking classifiers. In order to choose and extract the most significant and critical attributes from the dataset, Feature Selection (FS) techniques like ANOVA and Chi-square, as well as Feature Extraction (FE) strategies like Principal Component Analysis (PCA) and Independent Component Analysis (ICA), are being used. We used the Synthetic Minority Oversampling Technique (SMOTE) approach to balance the dataset because it is unbalanced. In order to forecast the illness, the top 5 models are selected, and these 5 models are used in ensemble methods such as voting and stacking classifiers. The utilization of Stacking Classifiers (SC), which involve Adaboost and Random Forest (RF) as meta-classifiers for disease detection. The performance of the proposed SC with meta-classifier as RF model, which incorporates Chi-square with PCA, outperformed all other state-of-the-art models, achieving scores of 98.79%,98.88%,98.69%,96.32%, and 98.77% for accuracy, precision, recall, specificity, and f1-score respectively.
  • An efficient ensemble-based Machine Learning for breast cancer detection

    Kapila R., Saleti S.

    Article, Biomedical Signal Processing and Control, 2023, DOI Link

    View abstract ⏷

    Breast cancer is a very severe type of cancer that often develops in breast cells. Attempting to develop an effective predictive model for breast cancer prognosis prediction is urgently needed despite substantial advancements in the management of symptomatic breast cancer over the past ten years. The precise prediction will offer numerous advantages, including the ability to diagnose cancer at an early stage and protect patients from needless medical care and related costs. In the medical field, recall is just as important as model accuracy. Even more crucially in the medical area, a model is not very good if its accuracy is high but its recall is low. To boost accuracy while still assigning equal weight to recall, we proposed a model that ensembles Feature Selection (FS), Feature Extraction (FE), and 5 Machine Learning (ML) models. There are three steps in our proposed model. The Correlation Coefficient (CC) and Anova (Anv) feature selection methodologies to choose the features in the first stage. Applying Uniform Manifold Approximation and Projection (UMAP), t-distributed Stochastic Neighbour Embedding (t-SNE), and Principal Component Analysis (PCA) to extract the features in the second stage without compromising the crucial information. With 5 ML models and ensemble models such as Voting Classifier (VC) and Stacking Classifier (SC) after selecting and extracting features from the dataset to predict the disease will be the last stage. The results show that the proposed model CC-Anv with PCA using a SC outperformed all the existing methodologies with 100% accuracy, precision, recall, and f1-score.
  • An enhancement in the efficiency of disease prediction using feature extraction and feature selection

    Kapila R., Saleti S.

    Book chapter, Contemporary Applications of Data Fusion for Advanced Healthcare Informatics, 2023, DOI Link

    View abstract ⏷

    Cardiovascular diseases constitute one of the most dangerous and fatal illnesses. According to statistics, in 2019,17.9 million deaths are reportedfrom cardiovascular diseases. As a result, it is essential to detect the sickness early on to minimize the death rate. To handle data efficiently and precisely forecast the symptoms of illness, data mining and machine learning approaches may be applied. This study intends to employ seven supervised machine learning (ML) techniques to anticipate heart disease. The adoption of ML algorithms is the study's main objective and to investigate how feature extraction (FE) and feature selection (FS) methods might increase the effectiveness of ML models. The experimental results indicate that models withfeature selection and extraction techniques outperformed the model with the entire features from the dataset. As a case study, the authors considered three additional datasets, namely Parkinson's, diabetes, and lung cancer, in addition to the Cleveland Heart Disease dataset. However, the main focus of this study is on predicting heart disease.
  • Analyzing the Health Data: An Application of High Utility Itemset Mining

    Padmavathi K., Saleti S., Tottempudi S.S.

    Conference paper, 2023 International Conference on Advances in Computation, Communication and Information Technology, ICAICCIT 2023, 2023, DOI Link

    View abstract ⏷

    A data science endeavour called "high utility pattern mining"entails finding important patterns based on different factors like profit, frequency, and weight. High utility itemsets are among the various patterns that have undergone thorough study. These itemsets must exceed a minimum threshold specified by the user. This is particularly useful in practical applications like retail marketing and web services, where items have diverse characteristics. High-utility itemset mining facilitates decision- making by uncovering patterns that have a significant impact. Unlike frequent itemset mining, which identifies commonly oc- curring itemsets, high-utility itemsets often include rare items in real-world applications. Considering the application to the medical field, data mining has been employed in various ways. In this context, the primary method involves analyzing a health dataset that spans from 2014 to 2017 in the United States. The dataset includes categories such as diseases, states, and deaths. By examining these categories and mortality rates, we can derive high-utility itemsets that reveal the causes of the most deaths. In conclusion, high-utility pattern mining is a data science activity that concentrates on spotting significant patterns based on objective standards. It has proven valuable in various fields, including the medical domain, where analyzing datasets can uncover high-utility itemsets related to mortality rates and causes of death.Categories and Subject Descriptors - [Health Database Applica- tion] Data Mining
  • A Comparative Analysis of the Evolution of DNA Sequencing Techniques along with the Accuracy Prediction of a Sample DNA Sequence Dataset using Machine Learning

    Mohammed K.B., Boyapati S.V., Kandimalla M.D., Kavati M.B., Saleti S.

    Conference paper, 2023 2nd International Conference on Paradigm Shifts in Communications Embedded Systems, Machine Learning and Signal Processing, PCEMS 2023, 2023, DOI Link

    View abstract ⏷

    DNA is widely considered the blueprint of life. The instructions required for all life forms, to evolve, breed, and thrive are found in DNA. Deoxyribonucleic acid, more commonly known as DNA, is among the most essential chemicals in living cells. A biological macro-molecule is DNA, also known as deoxyri-bonucleic acid. Life's blueprint is encoded by it. Sequencing of DNA has exponentially progressed due to the immense increase in data production in today's world. By means of this paper, we intend to evaluate the evolution of DNA Sequencing methods and perform a comparative analysis of modern-day DNA sequencing techniques to the ones of the past. We also illuminate the potential of machine learning in this domain by taking an exploratory and predicting the DNA Sequence using a Multinomial Naive Bayes classifier.
  • A Comparison of Various Class Balancing and Dimensionality Reduction Techniques on Customer Churn Prediction

    Bhushan Mada S.P., Thimmireddygari N., Tata R., Thondapu S.R., Saleti S.

    Conference paper, 7th IEEE International Conference on Recent Advances and Innovations in Engineering, ICRAIE 2022 - Proceedings, 2022, DOI Link

    View abstract ⏷

    With the advancement of technology, companies are able to foresee the customers who are going to leave their organization much before. This problem of customer churn prediction is handled in the current work. In the real-world, data is not balanced, having more observations with respect to few classes and less observations in case of other classes. But, giving equal importance to each class is really significant to build an efficient prediction model. Moreover, real-world data contains many attributes meaning that the dimensionality is high. In the current paper, we discussed three data balancing techniques and two methods of dimensionality reduction i.e. feature selection and feature extraction. Further, selecting the best machine learning model for churn prediction is an important issue. This has been dealt in the current paper. Also, we aim to improve the efficiency of customer churn prediction by evaluating various class balancing and dimensionality reduction techniques. Moreover, we evaluated the performance of the models using AUC curves and K-fold cross-validation techniques.
  • Ontology Based Food Recommendation

    Chivukula R., Lakshmi T.J., Sumalatha S., Reddy K.L.R.

    Conference paper, Smart Innovation, Systems and Technologies, 2022, DOI Link

    View abstract ⏷

    Eating right is the most crucial aspect of healthy living. A nutritious, balanced diet keeps our bodies support fight off diseases. Many lifestyle related diseases such as diabetes and thyroid can often be avoided by active living and better nutrition. Having diet related knowledge is essential for all. With this motivation, an ontology related to food domain is discussed and developed in this work. The aim of this work is to create on ontology model in the food domain to help people in getting right recommendation about the food, based on their health conditions if any.
  • Mining High Utility Time Interval Sequences Using MapReduce Approach: Multiple Utility Framework

    Saleti S., Lakshmi T.J., Ahmad M.W.

    Article, IEEE Access, 2022, DOI Link

    View abstract ⏷

    Mining high utility sequential patterns is observed to be a significant research in data mining. Several methods mine the sequential patterns while taking utility values into consideration. The patterns of this type can determine the order in which items were purchased, but not the time interval between them. The time interval among items is important for predicting the most useful real-world circumstances, including retail market basket data analysis, stock market fluctuations, DNA sequence analysis, and so on. There are a very few algorithms for mining sequential patterns those consider both the utility and time interval. However, they assume the same threshold for each item, maintaining the same unit profit. Moreover, with the rapid growth in data, the traditional algorithms cannot handle the big data and are not scalable. To handle this problem, we propose a distributed three phase MapReduce framework that considers multiple utilities and suitable for handling big data. The time constraints are pushed into the algorithm instead of pre-defined intervals. Also, the proposed upper bound minimizes the number of candidate patterns during the mining process. The approach has been tested and the experimental results show its efficiency in terms of run time, memory utilization, and scalability.
  • Incremental mining of high utility sequential patterns using MapReduce paradigm

    Saleti S.

    Article, Cluster Computing, 2022, DOI Link

    View abstract ⏷

    High utility sequential pattern (HUSP) mining considers the nonbinary frequency values of items purchased in a transaction and the utility of each item. Incremental updates are very common in many real-world applications. Mining the high utility sequences by rerunning the algorithm every time when the data grows is not a simple task. Moreover, the centralized algorithms for mining HUSPs incrementally cannot handle big data. Hence, an incremental algorithm for high utility sequential pattern mining using MapReduce para-digm (MR-INCHUSP) has been introduced in this paper. The proposed algorithm includes the backward mining strategy that profoundly handles the knowledge acquired from the past mining results. Further, elicited from the co-occurrence relation between the items, novel sequence extension rules have been introduced to increase the speed of the mining process. The experimental results exhibit the performance of MR-INCHUSP on several real and synthetic datasets.
  • Constraint Pushing Multi-threshold Framework for High Utility Time Interval Sequential Pattern Mining

    Saleti S., Naga Sahithya N., Rasagna K., Hemalatha K., SaiCharan B., Karthik Upendra P.V.

    Conference paper, Communications in Computer and Information Science, 2022, DOI Link

    View abstract ⏷

    This paper aims to detect high utility sequential patterns including time intervals and multiple utility thresholds. There are many algorithms that mine sequential patterns considering utility factor, these can find the order between the items purchased but they exclude the time interval among items. Further, they consider only the same utility threshold for each item present in the dataset, which is not convincing to assign equal importance for all the items. Time interval of items plays a vital role to forecast the most valuable real-world situations like retail sector, market basket data analysis etc. Recently, UIPrefixSpan algorithm has been introduced to mine the sequential patterns including utility and time intervals. Nevertheless, it considers only a single minimum utility threshold assuming the same unit profit for each item. Hence, to solve the aforementioned issues, in the current work, we proposed UIPrefixSpan-MMU algorithm by utilizing a pattern growth approach and four time constraints. The experiments done on real datasets prove that UIPrefixSpan-MMU is more efficient and linearly scalable for generating the time interval sequences with high utility.
  • Exploring Patterns and Correlations Between Cryptocurrencies and Forecasting Crypto Prices Using Influential Tweets

    Kumar M., Priya G.S., Gadipudi P., Agarwal I., Sumalatha S.

    Conference paper, Communications in Computer and Information Science, 2022, DOI Link

    View abstract ⏷

    The Crypto market, as we know, is a market full of various kinds of investors and influencers. We all know the pizza incident in 2010 where a guy purchased two pizzas at 10000 BTC, which ranges nearly around 80 million in current times. That describes how much the market has progressed in these 10–12 years. You can see drastic changes in the price of several coins in the past few years, which brings in many new investors to invest their money in this market. Crypto Market has highly volatile currencies. Bitcoin was around 5K INR in 2013, and by year 2021, it reached 48 Lakhs INR, which shows how volatile the market is. The dataset provides many fascinating and valuable insights that help us gather practical knowledge. As data scientists, we are very keen to understand such a market whose data is unstable and keeps changing frequently and making out new patterns with time. This introduction of new patterns with time makes this problem an interesting one and keeps on motivating us to find some valuable information. So, through this manuscript, we tried to analyze two specific crypto coins for a particular period, including more than 2900 records. We found several interesting patterns in the dataset and explored the historical return using several statistical models. We plotted the opening and closing prices of the particular coin by using NumPy, SciPy, and Matplotlib. We also tried to make predictions of the cost of the specific currency and then plot the predicted price line with the actual price line and understand the difference in the prediction model with the fundamental price mode. To do so, we used the Simple Exponential Smoothing (SES) model and performed sentiment analysis based on influencing tweets on Twitter. That makes our prediction more accurate and more reliable than existing techniques. Lastly, we used a linear regression model to establish the relationship between the returns of Ripple and Bitcoin.
  • Mining Spatio-Temporal Sequential Patterns Using MapReduce Approach

    Saleti S., RadhaKrishna P., JaswanthReddy D.

    Conference paper, Communications in Computer and Information Science, 2022, DOI Link

    View abstract ⏷

    Spatio-temporal sequential pattern mining (STSPM) plays an important role in many applications such as mobile health, criminology, social media, solar events, transportation, etc. Most of the current studies assume the data is located in a centralized database on which a single machine performs mining. Thus, the existing centralized algorithms are not suitable for the big data environment, where data cannot be handled by a single machine. In this paper, our main aim is to find out the Spatio-temporal sequential patterns from the event data set using a distributed framework suitable for mining big data. We proposed two distributed algorithms, namely, MR-STBFM (MapReduce based spatio-temporal breadth first miner), and MR-SPTreeSTBFM (MapReduce based sequential pattern tree spatio-temporal breadth first miner). These are the distributed algorithms for mining spatio-temporal sequential patterns using Hadoop MapReduce framework. A spatio-temporal tree structure is used in MR-SPTreeSTBFM for reducing the candidate generation cost. This is an extension to the proposed MR-STBFM algorithm. The tree structure significantly improves the performance of the proposed approach. Also, the top-most significant pattern approach has been proposed to mine the top-most significant sequential patterns. Experiments are conducted to evaluate the performance of the proposed algorithms on the Boston crime dataset.
  • Student Placement Chance Prediction Model using Machine Learning Techniques

    Manike M., Singh P., Madala P.S., Varghese S.A., Sumalatha S.

    Conference paper, 2021 5th Conference on Information and Communication Technology, CICT 2021, 2021, DOI Link

    View abstract ⏷

    Obtaining employment upon graduation from uni-versity is one of the highest, if not the highest, priorities for students and young adults. Developing a system that can help these individuals obtain placement advice, analyze labor market trends, and assist educational institutions in assessing growing fields and opportunities would serve immense value. With the emergence of heavily refined Data Mining techniques and Machine Learning boiler plates, a model based on predictive analysis can help estimate a variety of realistic and possible placement metrics, such as the types of companies a junior year student can be placed in, or the companies that are likely to look for the specific skill sets of a student. Various attributes such as academic results, technical skills, training experiences, and projects can help predict purposes. We devised the XGBoost Technique, a structured or tabular data-focused approach that has recently dominated applied machine learning and Kaggle tournaments. XGBoost is a high-speed and high-performance implementation of gradient boosted decision trees. We created a model and ran numerous EDAs to determine whether the student will be placed or not, as well as in which type of organization he will be placed [Day Sharing, Dream, Super Dream, Marquee].
  • Distributed Mining of High Utility Sequential Patterns with Negative Item Values

    Varma M., Sumalatha S., reddy A.

    Article, International Journal of Advanced Computer Science and Applications, 2021, DOI Link

    View abstract ⏷

    The sequential pattern mining was widely used to solve various business problems, including frequent user click pattern, customer analysis of buying product, gene microarray data analysis, etc. Many studies were going on these pattern mining to extract insightful data. All the studies were mostly concentrated on high utility sequential pattern mining (HUSP) with positive values without a distributed approach. All the existing solutions are centralized which incurs greater computation and communication costs. In this paper, we introduce a novel algorithm for mining HUSPs including negative item values in support of a distributed approach. We use the Hadoop map reduce algorithms for processing the data in parallel. Various pruning techniques have been proposed to minimize the search space in a distributed environment, thus reducing the expense of processing. To our understanding, no algorithm was proposed to mine High Utility Sequential Patterns with negative item values in a distributed environment. So, we design a novel algorithm called DHUSP-N (Distributed High Utility Sequential Pattern mining with Negative values). DHUSP-N can mine high utility sequential patterns considering the negative item utilities from Bigdata.
  • Distributed Mining of High Utility Time Interval Sequential Patterns with Multiple Minimum Utility Thresholds

    Saleti S., Tangirala J.L., Thirumalaisamy R.

    Conference paper, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2021, DOI Link

    View abstract ⏷

    In this paper, the problem of mining high utility time interval sequential patterns with multiple utility thresholds in a distributed environment is considered. Mining high utility sequential patterns (HUSP) is an emerging issue and the existing HUSP algorithms can mine the order of items and they do not consider the time interval between the successive items. In real-world applications, time interval patterns provide more useful information than the conventional HUSPs. Recently, we proposed distributed high utility time interval sequential pattern mining (DHUTISP) algorithm using MapReduce in support of the BigData environment. The algorithm has been designed considering a single minimum utility threshold. It is not convincing to use the same utility threshold for all the items in the sequence, which means that all the items are given the same importance. Hence, in this paper, a new distributed framework is proposed to efficiently mine high utility time interval sequential patterns with multiple minimum utility thresholds (DHUTISP-MMU) using the MapReduce approach. The experimental results show that the proposed approach can efficiently mine HUTISPs with multiple minimum utility thresholds.
  • Distributed mining of high utility time interval sequential patterns using mapreduce approach

    Sumalatha S., Subramanyam R.B.V.

    Article, Expert Systems with Applications, 2020, DOI Link

    View abstract ⏷

    High Utility Sequential Pattern mining (HUSP) algorithms aim to find all the high utility sequences from a sequence database. Due to the large explosion of data, recently few distributed algorithms have been designed for mining HUSPs based on the MapReduce framework. However, the existing HUSP algorithms such as USpan, HUS-Span and BigHUSP are able to predict only the order of items, they do not predict the time between the items, that is, they do not include the time intervals between the successive items. But in a real-world scenario, time interval patterns provide more valuable information than conventional high utility sequential patterns. Therefore, we propose a distributed high utility time interval sequential pattern mining (DHUTISP) algorithm using the MapReduce approach that is suitable for big data. DHUTISP creates a novel time interval utility linked list data structure (TIUL) to efficiently calculate the utility of the resulting patterns. Moreover, two utility upper bounds, namely, remaining utility upper bound (RUUB) and co-occurrence utility upper bound (CUUB) are proposed to prune the unpromising candidates. We conducted various experiments to prove the efficiency of the proposed algorithm over both the distributed and non-distributed approaches. The experimental results show the efficiency of DHUTISP over state-of-the-art algorithms, namely, BigHUSP, AHUS-P, PUSOM and UTMining_A.
  • A MapReduce solution for incremental mining of sequential patterns from big data

    Saleti S., R.B.V. S.

    Article, Expert Systems with Applications, 2019, DOI Link

    View abstract ⏷

    Sequential Pattern Mining (SPM) is a popular data mining task with broad applications. With the advent of big data, traditional SPM algorithms are not scalable. Hence, many of the researchers have migrated to big data frameworks such as MapReduce and proposed distributed algorithms. However, the existing MapReduce algorithms assume the data as static and do not handle the incremental database updates. Moreover, they use to re-mine the updated database while new sequences are inserted. In this paper, we propose an efficient distributed algorithm for incremental sequential pattern mining (MR-INCSPM) using the MapReduce framework that can handle big data. The proposed algorithm incorporates the backward mining approach that efficiently makes use of the knowledge obtained during the previous mining process. Also, based on the study of item co-occurrences, we propose Co-occurrence Reverse Map (CRMAP) data structure. The issue of combinatorial explosion of candidate sequences is dealt using the proposed CRMAP data structure. Besides, a novel candidate generation and early prune mechanisms are designed using CRMAP to speed up the mining process. The proposed algorithm is evaluated on both the real and synthetic datasets. The experimental results prove the efficacy of MR-INCSPM with respect to processing time, memory and pruning efficiency.
  • A novel mapreduce algorithm for distributed mining of sequential patterns using co-occurrence information

    Saleti S., Subramanyam R.B.V.

    Article, Applied Intelligence, 2019, DOI Link

    View abstract ⏷

    Sequential Pattern Mining (SPM) problem is much studied and extended in several directions. With the tremendous growth in the size of datasets, traditional algorithms are not scalable. In order to solve the scalability issue, recently few researchers have developed distributed algorithms based on MapReduce. However, the existing MapReduce algorithms require multiple rounds of MapReduce, which increases communication and scheduling overhead. Also, they do not address the issue of handling long sequences. They generate huge number of candidate sequences that do not appear in the input database and increases the search space. This results in more number of candidate sequences for support counting. Our algorithm is a two phase MapReduce algorithm that generates the promising candidate sequences using the pruning strategies. It also reduces the search space and thus the support computation is effective. We make use of the item co-occurrence information and the proposed Sequence Index List (SIL) data structure helps in computing the support at fast. The experimental results show that the proposed algorithm has better performance over the existing MapReduce algorithms for the SPM problem.
  • A novel bit vector product algorithm for mining frequent itemsets from large datasets using mapreduce framework

    Saleti S., Subramanyam R.B.V.

    Article, Cluster Computing, 2018, DOI Link

    View abstract ⏷

    Frequent itemset mining (FIM) is an interesting sub-area of research in the field of Data Mining. With the increase in the size of datasets, conventional FIM algorithms are not suitable and efforts are made to migrate to the Big Data Frameworks for designing algorithms using MapReduce like computing paradigms.We too interested in designing MapReduce based algorithm. Initially, our Parallel Compression algorithm makes data simpler to handle. A novel bit vector data structure is proposed to maintain compressed transactions and it is formed by scanning the dataset only once. Our Bit Vector Product algorithm follows the MapReduce approach and effectively searches for frequent itemsets from a given list of transactions. The experimental results are present to prove the efficacy of our approach over some of the recent works.
Contact Details

sumalatha.s@srmap.edu.in

Scholars

Doctoral Scholars

  • Mr Ramdas Kapila
  • Ms A Sai Sunanda

Interests

  • Artificial Intelligence
  • Data Science
  • Distributed Computing
  • Machine Learning

Education
2004
B.Tech
Narayana Engineering College, JNTU Hyderabad
India
2010
M.Tech
Annamacharya Institute of science and technology, JNTU Anantapur
India
2020
Ph.D.
National institute of Technology, Warangal
India
Experience
  • January 2019 to March 2020 - Associate Professor – Geethanjali Institute of Science and Technology, Nellore, Andhra Pradesh.
  • July 2015 to December 2018 – Research Scholar – National Institute of Technology, Warangal, Telangana.
  • May 2012 to May 2015 – Associate Professor – Geethanjali Institute of Science and Technology, Nellore, Andhra Pradesh.
  • October 2010 to April 2012 – Assistant Professor - Narayana Engineering College, Nellore, Andhra Pradesh.
  • July 2004 to August 2008 – Assistant Professor – Narayana Engineering College, Nellore, Andhra Pradesh.
Research Interests
  • To implement a learning management system and study the navigational patterns to enhance students learning.
  • To develop incremental mining algorithms.
Awards & Fellowships
  • 2022 - Undergraduate students under my supervision received one gold medal and one silver medal (September 2022) for paper presentation in Research Day organized by SRM University, AP, India.
  • 2022 - Undergraduate students under my supervision received a gold medal (January 2022) for paper presentation in Research Day organized by SRM University, AP, India.
  • 2015 to 2018 - Ph.D. Fellowship - Ministry of Human Resource Development
  • 2010 – Secured first rank in Master’s Degree.
Memberships
  • Life Member of ISTE
Publications
  • A fully decentralized federated adversarial vision transformer with blockchain and secure aggregation for visual-based intrusion and malware forensics

    Shiva, Fazad

    Journal, International Journal of Data science and analytics, 2026, Quartile: Q2DOI Link

    View abstract ⏷

    This paper presents a fully decentralized federated adversarial vision transformer (ViT) framework for secure, privacy-preserving, and robust image-based malware classification. Unlike conventional federated learning that relies on centralized aggregation and remains vulnerable to privacy breaches and adversarial attacks, the proposed system employs blockchain- based decentralized aggregation integrated with secure multi-party computation. Encrypted local model updates are securely aggregated without a central server, while the blockchain ledger ensures transparency, tamper resistance, and trust. To further enhance security, a zero-knowledge proof-based mechanism validates masked model updates, enabling verifiable aggregation without exposing raw parameters. Clients reconstruct the global model through decentralized consensus, preventing direct access to others’ updates. Adversarial robustness is improved via client-side adversarial ViT training, incorporating projected gradient descent-generated malware images with clean samples, thereby reducing false classifications. Computational efficiency is achieved by leveraging pre-trained ViT variants for resource-constrained environments. Extensive experiments on Malimg, Microsoft BIG 2015, and Malevis datasets demonstrate superior performance, achieving accuracies of 98.30%, 98.93%, and 95.72%, respectively. Compared to centralized and federated adversarial ViTs, as well as state-of-the-art methods (FASe-ViT, FASNet, DM-Mal, Fed-Mal), the proposed framework consistently achieves higher accuracy, precision, recall, and F1-scores, while ensuring privacy, resilience, and decentralized trust.
  • Dynamic RBFN with vector attention-guided feature selection for spam detection in social media

    Elakkiya E., Saleti S., Balakrishnan A.

    Article, Complex and Intelligent Systems, 2026, DOI Link

    View abstract ⏷

    Online social media platforms have emerged as primary engagement channels for internet users, leading to increased dependency on social network information. This growing reliance has attracted cybercriminals, resulting in a surge of malicious activities such as spam. Consequently, there is a pressing need for efficient spam detection mechanisms. Although several techniques have been proposed for social network spam detection, spammers continually evolve their strategies to bypass these systems. In response, researchers have focused on extracting additional features to better identify spammer patterns. However, this often introduces feature redundancy and complexity, which traditional machine learning-based feature selection methods struggle to manage in highly complex datasets. To address this, we propose a novel attention network-based feature selection method that assigns weights to features based on their importance, reducing redundancy while retaining relevant information. Additionally, an adaptive Radial Basis Function Neural Network (RBFN) is employed for spam classification, enabling dynamic weight updates to reflect evolving spam behaviors. The proposed method is evaluated against state-of-the-art feature selection, deep learning models, and existing spam detection techniques using accuracy, F-measure, and false-positive rate. Experimental results demonstrate that our approach outperforms existing methods, offering superior performance in detecting spam on social networks.
  • Federated learning-based disease prediction: A fusion approach with feature selection and extraction

    Kapila R., Saleti S.

    Article, Biomedical Signal Processing and Control, 2025, DOI Link

    View abstract ⏷

    The ability to predict diseases is critical in healthcare for early intervention and better patient outcomes. Data security and privacy significantly classified medical data from several institutions is analyzed. Cooperative model training provided by Federated Learning (FL), preserves data privacy. In this study, we offer a fusion strategy for illness prediction, combining FL with Anova and Chi-Square Feature Selection (FS) and Linear Discriminate Analysis (LDA) Feature Extraction (FE) techniques. This research aims to use FS and FE techniques to improve prediction performance while using the beneficial aspects of FL. A comprehensive analysis of the distributed data is ensured by updating aggregate models with information from all participating institutions. Through collaboration, a robust disease prediction system excels in the limited possibilities of individual datasets. We assessed the fusion strategy on the Cleveland heart disease and diabetes datasets from the UCI repository. Comparing the fusion strategy to solo FL or conventional ML techniques, the prediction performance a unique fusion methodology for disease prediction. Our proposed models, Chi-Square with LDA and Anova with LDA leveraging FL, exhibited exceptional performance on the diabetes dataset, achieving identical accuracy, precision, recall, and f1-score of 92.3%, 94.36%, 94.36, and 94.36%, respectively. Similarly, on the Cleveland heart disease dataset, these models demonstrated significant performance, achieving accuracy, precision, recall, and f1-score of 88.52%, 87.87%, 90.62, and 89.23%, respectively. The results have the potential to revolutionize disease prediction, maintain privacy, advance healthcare, and outperform state-of-the-art models.
  • Optimizing Deep Learning for Pneumonia Diagnosis Using Chest X-Ray Data

    Kapila R., Sunanda A.S., Saleti S., Elakkiya E.

    Book chapter, Sensor Data Analytics for Intelligent Healthcare Delivery, 2025, DOI Link

  • Enhancing Disease Prediction with Correctness-Driven Ensemble Models

    Kapila R., Saleti S.

    Article, SN Computer Science, 2025, DOI Link

    View abstract ⏷

    Heart disease is the most dangerous and hazardous one. Human lives can be spared if the disease is diagnosed early enough and treated properly. We propose an efficient ensemble model which classifies all records correctly on the benchmark datasets. The correctness is accomplished by using Anova-Principal Component Analysis (Anv-PCA) techniques with a Stacking Classifier (SC) to select and extract the best features. The most significant component to evaluate in the medical area is recall. The findings show that the proposed Anv-PCA with SC meets all of the correctness requirements in concepts of accuracy, precision, recall, and f1-score with the highest results compared with the existing approaches. Anv-PCA, a method for selecting and extracting features, is paired with an ensemble classification algorithm in the approach we propose, which makes use of the Cleveland heart disease UCI dataset. All patient records are correctly categorized using this method, fulfilling the required criteria for correctness. The proposed model is also validated on other six publicly available benchmark datasets for diabetes, cardiovascular, Framingham, CBC, COVID-specific and HD (comprehensive) datasets available in the UCI repository, which presently meets the correctness requirements. The proposed approach exceeds all cutting-edge models.
  • Insights into Gun-Related Deaths: A Comprehensive Machine Learning Analysis

    Panchumarthi L.Y., Parchuri L., Saleti S.

    Conference paper, 2024 15th International Conference on Computing Communication and Networking Technologies, ICCCNT 2024, 2024, DOI Link

    View abstract ⏷

    This work employs both supervised and unsupervised machine learning techniques to examine firearm-related fatalities in the US and identify trends, patterns, and risk factors within the data. During the supervised learning phase, techniques such as logistic regression, decision trees, random forests, and neural networks were used to predict the kind of death (suicide, homicide, accidental, or unknown) based on demographic data like sex, age, race, place, and education. Findings show that the neural network and random forest models exhibit promising precision and recall values across several classes, and that they obtained the highest accuracy, reaching 79.88% and 83.59%, respectively. Using clustering techniques including Agglomerative clustering, K-means, and Gaussian mixture models, gun-related fatalities were categorized based on demographic and temporal data during the unsupervised learning stage. The analysis revealed distinct clusters of deaths, providing insights into the varying patterns and trends over time and across demographic groups. The K-means algorithm, with a silhouette score of 0.42, demonstrated meaningful separation among clusters. The research contributes to understanding the complex dynamics of gun-related deaths, shedding light on both individual risk factors and broader trends. However, further analysis could explore additional dimensions of the dataset or delve deeper into the interpretation of clustering results. The study also highlights how crucial it is to take into consideration the moral consequences and constraints of machine learning applications in complex fields like public health.
  • Enhancing Forecasting Accuracy with a Moving Average-Integrated Hybrid ARIMA-LSTM Model

    Saleti S., Panchumarthi L.Y., Kallam Y.R., Parchuri L., Jitte S.

    Article, SN Computer Science, 2024, DOI Link

    View abstract ⏷

    This research provides a time series forecasting model that is hybrid which combines Long Short-Term Memory (LSTM) and Autoregressive Integrated Moving Average (ARIMA) models with moving averages. For modelling stationary time series, LSTM models are utilized, while modelling non-stationary time series is done using ARIMA models. While LSTM models are more suited for capturing long-term dependencies, ARIMA models are superior in catching short-term relationships in time series data. The hybrid model combines the short-term dependency modeling of ARIMA utilising LSTM’s long-term dependency modelling. This combination leads to greater accuracy predictions for time series data that are both stationary and non-stationary. Also, Triple Exponential Moving Average (TEMA), Weighted Moving Average (WMA), Simple Moving Average (SMA), and six other moving averages were examined to determine how well the hybrid model performed. Kaufman Adaptive Moving Average (KAMA), MIDPOINT, MIDPRICE individually helps to know which methods give much precision. The study compares the hybrid model’s predicting performance to that of standalone ARIMA and LSTM models, in addition to other prominent forecasting approaches like linear regression and random forest. The findings indicate that the hybrid model surpasses the individual models and other benchmark methods, achieving increased precision in terms of Mean absolute percentage error (MAPE) and Root mean squared error (RMSE). The research also investigates the impact of different hyperparameters and model configurations on performance forecasts, giving information about the ideal settings for the hybrid model. Overall, the proposed ARIMA-LSTM hybrid model with moving averages is a promising approach for accurate and reliable stock price forecasting, which has practical implications for financial decision-making and risk management.
  • Comparative Analysis of Optimization Algorithms for Feature Selection in Heart Disease Classification

    Kapila R., Saleti S.

    Conference paper, Lecture Notes in Networks and Systems, 2024, DOI Link

    View abstract ⏷

    In order to classify the Cleveland Heart Disease dataset, this study evaluates the performance of three optimization methods, namely Fruit Fly Optimization (FFO), Particle Swarm Optimization (PSO), and Grey Wolf Optimizer (GWO). The top 10 features are identified using FFO, with remarkable results for accuracy (88.52%), precision (87.88%), recall (90.63%), and f1-score (89.23%). After using PSO, the accuracy, precision, recall, specificity, and f1-score are all 85.25%, 87.10%, 84.38%, and 85.71% respectively. Finally, GWO is used, which results in precision, accuracy, recall, and f1-score values of 93.33%, 90.16%, 90.11%, 87.5%, and 90.32% respectively, highlighting its consistent superior performance. FFO shows competitive outcomes with notable accuracy and recall through a comparative examination. PSO displays comparable precision and recall while displaying a somewhat poorer accuracy. In contrast, GWO performs better than both FFO and PSO, displaying great accuracy and precision along with remarkable recall and specificity. These results provide important information on the effectiveness of feature selection methods utilized in optimization algorithms for heart disease classification. The study also highlights the need for more investigation into the potential of these optimization algorithms in many fields, broadening their use beyond the classification of disease. This kind of work might improve the progress of the field of feature selection and aid in the creation of better classification models.
  • Addressing Nutrition Gaps: An In-depth Study of Child Undernutrition

    Panchumarthi L.Y., Manike M., Saleti S., Singh P., Madala P.S., Varghese S.A.

    Conference paper, Proceedings - 2024 OITS International Conference on Information Technology, OCIT 2024, 2024, DOI Link

    View abstract ⏷

    Malnutrition among children poses a significant health issue, leading to both immediate and long-term adverse health consequences. Shockingly, around two out of every three children in India, residing in both rural and urban areas, experience undernourishment. Moreover, the future behavioral and psychological well-being of children is profoundly affected by the consequences of insufficient nutrient intake. The key observations during a child’s growth, focusing on Wasting, Stunting, and Underweight, highlight the severity of the problem. This study utilizes data from the government initiatives E-Sadhana and Anganwadi. By employing data mining and econometric techniques, the paper aims to elucidate the causal factors and provide a visual representation of the disparities in under-nutrition across the state, considering economic and social factors.
  • Comparative Study of Melanoma Disease Classification using Deep Learning

    Panchumarthi L.Y., Gogineni A.K., Saleti S., Parchuri L., Kakarala H.

    Conference paper, Proceedings - 2024 OITS International Conference on Information Technology, OCIT 2024, 2024, DOI Link

    View abstract ⏷

    Melanoma is one of the deadliest and fastest-growing diseases on the globe, taking many lives every year. The early identification of melanoma through dermoscopy images can notably enhance the chances of survival. However, due to factors including the absence of contrast between the lesions and the skin and the visual similarity between melanoma and nonmalignant lesions, reliable melanoma differentiation is extremely difficult. Therefore, the accuracy and productivity of pathologists can be significantly increased by implementing a trustworthy automated method for the detection of skin tumours. This study introduces a method that employs deep learning models for cancer detection. Furthermore, we evaluate and analyze the following six deep learning approaches: Inception-ResNetV2, CNN, VGG16, EfficientNet, Densenet201, and MobileNetV2. Two different datasets, ISIC and MNIST, were used to evaluate the suggested deep learning frameworks. The experimental results demonstrate the promising accuracy of our frameworks. This survey highlights significant datasets, benchmark challenges, and evaluation metrics related to skin lesion analysis, offering a thorough overview of the field.
  • Enhancing the Accuracy of Manufacturing Process Error Detection Through SMOTE-Based Oversampling Using Machine Learning and Deep Learning Techniques

    Boyapati S.V., Rakshitha G.B.S., Reddy M.R., Saleti S.

    Conference paper, International Conference on Integrated Circuits, Communication, and Computing Systems, ICIC3S 2024 - Proceedings, 2024, DOI Link

    View abstract ⏷

    A production competency study leads to a rise in the manufacturing sectors' strategic emphasis. Developing semiconductor materials is a highly complex approach that necessitates numerous evaluations. It is impossible to emphasize the significance of the quality of the product. We put up a number of methods for automatically creating a prognostic model that is effective at identifying equipment flaws throughout the semiconductor materials' wafer fabrication process. The SECOM dataset is representative of semiconductor production procedures that go through numerous tests performed. There are imbalanced statistics in the dataset, so our proposed methodology incorporates SMOTE (Synthetic Minority Over-sampling Technique) functionality that is introduced to mitigate the imbalance of the training dataset by leveling off any unbalanced attributes. Detecting faults in the manufacturing process improves semiconductor quality and testing efficiency, and is used to validate both approaches to Machine Learning and Deep Learning algorithms. This is accomplished by collecting performance metrics during the development process. Another aspect of our effort to cut down on the training time for testing is highlighted in our research report.
  • A Survey on Occupancy-Based Pattern Mining

    Inaganti B., Saleti S.

    Conference paper, Lecture Notes in Networks and Systems, 2024, DOI Link

    View abstract ⏷

    Occupancy-based pattern mining has emerged as a significant research topic in recent times. This paper presents a comprehensive survey on occupancy, which serves as a measure to augment the significance of patterns. The survey covers various types of patterns, including frequent itemsets, high utility itemsets, frequent sequences, and high utility sequences, all in the context of occupancy. Additionally, the paper delves into techniques aimed at reducing the search space in the aforementioned pattern mining problems. These techniques are crucial for improving the efficiency and scalability of the mining process, especially when dealing with large-scale datasets. Furthermore, the paper discusses potential research extensions for occupancy-based pattern mining. These extensions could explore new applications, explore novel algorithms, or further enhance the effectiveness of occupancy as a measure for pattern evaluation. In general, this survey provides an important resource for researchers interested in understanding and advancing occupancy-based pattern mining techniques.
  • Leveraging ResNet for Efficient ECG Heartbeat Classification

    Panchumarthi L.Y., Padmanabhuni S., Saleti S.

    Conference paper, Proceedings - 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems: Harmonizing Signals, Data, and Energy: Bridging the Digital Future, SPICES 2024, 2024, DOI Link

    View abstract ⏷

    This paper provides a novel approach that uses a modified version of the ResNet architecture to classify heartbeats on an electrocardiogram (ECG). Padding, convolution, max pooling, convolutional blocks, average pooling, and fully linked layers are the six processes in the approach. The MIT-BIH Arrhythmia Database is used to test the approach on five different types of heartbeats: unclassifiable, supraventricular premature, premature ventricular contraction, fusion of ventricular and normal, and normal. The outcomes demonstrate that the suggested approach outperforms other current techniques like LSTM, CNN, and EfficientNet, achieving an accuracy of 98.6%. The performance, restrictions, and future directions of the model are also thoroughly examined in this work. The automated ECG heartbeat categorization using deep learning techniques is one way that the article advances the field of cardiac diagnosis.
  • Optimizing Recommendation Systems: Analyzing the Impact of Imputation Techniques on Individual and Group Recommendation Systems

    Bhushan Mada S.P., Tata R., Sree Reddy Thondapu S.T., Saleti S.

    Conference paper, Proceedings - 2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems: Harmonizing Signals, Data, and Energy: Bridging the Digital Future, SPICES 2024, 2024, DOI Link

    View abstract ⏷

    In today's world, Recommendation Systems play a significant role in guiding and simplifying the decision-making process for individuals and groups. However, the presence of missing data in user-item interaction matrices poses a challenge to accurately identify user preferences and provide relevant suggestions. This is particularly true for group recommendation systems that cater to multiple users. To address this challenge, we have applied four imputation techniques to individual and group recommendation models, including User-based Collaborative filtering, Matrix factorization using Singular Value Decomposition, and deep learning-based models like Autoencoders. We evaluated the effectiveness of these techniques using root mean squared error and mean absolute error metrics and observed a significant impact on the quality of recommendations. Additionally, we implemented aggregation strategies like Borda count, Additive Utilitarian, Multiplicative Utilitarian, Least Misery, and Most Pleasure for Group Recommendations. We evaluated the performance of these strategies using satisfaction score and disagreement score. Overall, our findings suggest that imputation techniques can significantly improve the quality of recommendations in both individual and group recommendation systems.
  • Enhancing Customer Churn Prediction: Advanced Models and Resampling Techniques in Dynamic Business Environments

    Thotakura Y.C., Manikanta Yarramsetty D., Doppalapudi K.K., Shasank Alaparthi S., Saleti S.

    Conference paper, Intelligent Computing and Emerging Communication Technologies, ICEC 2024, 2024, DOI Link

    View abstract ⏷

    Customer churn analysis is critical for businesses looking to hold onto market share in today's dynamic business environment. The development of e-Finance presents additional difficulties for the traditional banking sector as the digital marketplace grows. Banks face several challenges, including fintech competition, dwindling client loyalty, and digital transformation. Bank managers can identify problems, identify potential churn customers early on, and develop effective retention strategies based on client traits and preferences by analyzing probable causes of bank customer turnover from multiple perspectives and building models for predicting churn. Not only banks but also large corporate sectors like telecommunication, and over-the-top (OTT) platforms do face customer churn. This study proposes the Random Leaf Model (RLM) and also explores the Logit Leaf Model (LLM), and Neural Network Ensemble Model, three sophisticated predictive modeling methodologies. Proactive strategies are necessary in today's marketplaces due to their competitive nature. The primary problem with current automatic churn prediction algorithms is the substantial gap between majority and minority class proportions in the datasets, which might lead to model bias in favor of the dominant class. The shortcomings of conventional churn analysis techniques underscore the necessity of implementing advanced cutting-edge algorithms to achieve precise forecasts.
  • A comparison of various machine learning algorithms and execution of flask deployment on essay grading

    Kotha U.M., Gaddam H., Siddenki D.R., Saleti S.

    Article, International Journal of Electrical and Computer Engineering, 2023, DOI Link

    View abstract ⏷

    Students’ performance can be assessed based on grading the answers written by the students during their examination. Currently, students are assessed manually by the teachers. This is a cumbersome task due to an increase in the student-teacher ratio. Moreover, due to coronavirus disease (COVID-19) pandemic, most of the educational institutions have adopted online teaching and assessment. To measure the learning ability of a student, we need to assess them. The current grading system works well for multiple choice questions, but there is no grading system for evaluating the essays. In this paper, we studied different machine learning and natural language processing techniques for automated essay scoring/grading (AES/G). Data imbalance is an issue which creates the problem in predicting the essay score due to uneven distribution of essay scores in the training data. We handled this issue using random over sampling technique which generates even distribution of essay scores. Also, we built a web application using flask and deployed the machine learning models. Subsequently, all the models have been evaluated using accuracy, precision, recall, and F1-score. It is found that random forest algorithm outperformed the other algorithms with an accuracy of 97.67%, precision of 97.62%, recall of 97.67%, and F1-score of 97.58%.
  • Heart Disease Prediction Using Novel Quine McCluskey Binary Classifier (QMBC)

    Kapila R., Ragunathan T., Saleti S., Lakshmi T.J., Ahmad M.W.

    Article, IEEE Access, 2023, DOI Link

    View abstract ⏷

    Cardiovascular disease is the primary reason for mortality worldwide, responsible for around a third of all deaths. To assist medical professionals in quickly identifying and diagnosing patients, numerous machine learning and data mining techniques are utilized to predict the disease. Many researchers have developed various models to boost the efficiency of these predictions. Feature selection and extraction techniques are utilized to remove unnecessary features from the dataset, thereby reducing computation time and increasing the efficiency of the models. In this study, we introduce a new ensemble Quine McCluskey Binary Classifier (QMBC) technique for identifying patients diagnosed with some form of heart disease and those who are not diagnosed. The QMBC model utilizes an ensemble of seven models, including logistic regression, decision tree, random forest, K-nearest neighbour, naive bayes, support vector machine, and multilayer perceptron, and performs exceptionally well on binary class datasets. We employ feature selection and feature extraction techniques to accelerate the prediction process. We utilize Chi-Square and ANOVA approaches to identify the top 10 features and create a subset of the dataset. We then apply Principal Component Analysis to the subset to identify 9 prime components. We utilize an ensemble of all seven models and the Quine McCluskey technique to obtain the Minimum Boolean expression for the target feature. The results of the seven models ( x_0, x_1, x_2,⋖, x_6 ) are considered independent features, while the target attribute is dependent. We combine the projected outcomes of the seven ML models and the target feature to form a foaming dataset. We apply the ensemble model to the dataset, utilizing the Quine McCluskey minimum Boolean equation built with an 80:20 train-to-test ratio. Our proposed QMBC model surpasses all current state-of-the-art models and previously suggested methods put forward by various researchers.
  • Optimizing fetal health prediction: Ensemble modeling with fusion of feature selection and extraction techniques for cardiotocography data

    Kapila R., Saleti S.

    Article, Computational Biology and Chemistry, 2023, DOI Link

    View abstract ⏷

    Cardiotocography (CTG) captured the fetal heart rate and the timing of uterine contractions. Throughout pregnancy, CTG intelligent categorization is crucial for monitoring fetal health and preserving proper fetal growth and development. Since CTG provides information on the fetal heartbeat and uterus contractions, which helps determine if the fetus is pathologic or not, obstetricians frequently use it to evaluate a child's physical health during pregnancy. In the past, obstetricians have artificially analyzed CTG data, which is time-consuming and inaccurate. So, developing a fetal health categorization model is crucial as it may help to speed up the diagnosis and treatment and conserve medical resources. The CTG dataset is used in this study. To diagnose the illness, 7 machine learning models are employed, as well as ensemble strategies including voting and stacking classifiers. In order to choose and extract the most significant and critical attributes from the dataset, Feature Selection (FS) techniques like ANOVA and Chi-square, as well as Feature Extraction (FE) strategies like Principal Component Analysis (PCA) and Independent Component Analysis (ICA), are being used. We used the Synthetic Minority Oversampling Technique (SMOTE) approach to balance the dataset because it is unbalanced. In order to forecast the illness, the top 5 models are selected, and these 5 models are used in ensemble methods such as voting and stacking classifiers. The utilization of Stacking Classifiers (SC), which involve Adaboost and Random Forest (RF) as meta-classifiers for disease detection. The performance of the proposed SC with meta-classifier as RF model, which incorporates Chi-square with PCA, outperformed all other state-of-the-art models, achieving scores of 98.79%,98.88%,98.69%,96.32%, and 98.77% for accuracy, precision, recall, specificity, and f1-score respectively.
  • An efficient ensemble-based Machine Learning for breast cancer detection

    Kapila R., Saleti S.

    Article, Biomedical Signal Processing and Control, 2023, DOI Link

    View abstract ⏷

    Breast cancer is a very severe type of cancer that often develops in breast cells. Attempting to develop an effective predictive model for breast cancer prognosis prediction is urgently needed despite substantial advancements in the management of symptomatic breast cancer over the past ten years. The precise prediction will offer numerous advantages, including the ability to diagnose cancer at an early stage and protect patients from needless medical care and related costs. In the medical field, recall is just as important as model accuracy. Even more crucially in the medical area, a model is not very good if its accuracy is high but its recall is low. To boost accuracy while still assigning equal weight to recall, we proposed a model that ensembles Feature Selection (FS), Feature Extraction (FE), and 5 Machine Learning (ML) models. There are three steps in our proposed model. The Correlation Coefficient (CC) and Anova (Anv) feature selection methodologies to choose the features in the first stage. Applying Uniform Manifold Approximation and Projection (UMAP), t-distributed Stochastic Neighbour Embedding (t-SNE), and Principal Component Analysis (PCA) to extract the features in the second stage without compromising the crucial information. With 5 ML models and ensemble models such as Voting Classifier (VC) and Stacking Classifier (SC) after selecting and extracting features from the dataset to predict the disease will be the last stage. The results show that the proposed model CC-Anv with PCA using a SC outperformed all the existing methodologies with 100% accuracy, precision, recall, and f1-score.
  • An enhancement in the efficiency of disease prediction using feature extraction and feature selection

    Kapila R., Saleti S.

    Book chapter, Contemporary Applications of Data Fusion for Advanced Healthcare Informatics, 2023, DOI Link

    View abstract ⏷

    Cardiovascular diseases constitute one of the most dangerous and fatal illnesses. According to statistics, in 2019,17.9 million deaths are reportedfrom cardiovascular diseases. As a result, it is essential to detect the sickness early on to minimize the death rate. To handle data efficiently and precisely forecast the symptoms of illness, data mining and machine learning approaches may be applied. This study intends to employ seven supervised machine learning (ML) techniques to anticipate heart disease. The adoption of ML algorithms is the study's main objective and to investigate how feature extraction (FE) and feature selection (FS) methods might increase the effectiveness of ML models. The experimental results indicate that models withfeature selection and extraction techniques outperformed the model with the entire features from the dataset. As a case study, the authors considered three additional datasets, namely Parkinson's, diabetes, and lung cancer, in addition to the Cleveland Heart Disease dataset. However, the main focus of this study is on predicting heart disease.
  • Analyzing the Health Data: An Application of High Utility Itemset Mining

    Padmavathi K., Saleti S., Tottempudi S.S.

    Conference paper, 2023 International Conference on Advances in Computation, Communication and Information Technology, ICAICCIT 2023, 2023, DOI Link

    View abstract ⏷

    A data science endeavour called "high utility pattern mining"entails finding important patterns based on different factors like profit, frequency, and weight. High utility itemsets are among the various patterns that have undergone thorough study. These itemsets must exceed a minimum threshold specified by the user. This is particularly useful in practical applications like retail marketing and web services, where items have diverse characteristics. High-utility itemset mining facilitates decision- making by uncovering patterns that have a significant impact. Unlike frequent itemset mining, which identifies commonly oc- curring itemsets, high-utility itemsets often include rare items in real-world applications. Considering the application to the medical field, data mining has been employed in various ways. In this context, the primary method involves analyzing a health dataset that spans from 2014 to 2017 in the United States. The dataset includes categories such as diseases, states, and deaths. By examining these categories and mortality rates, we can derive high-utility itemsets that reveal the causes of the most deaths. In conclusion, high-utility pattern mining is a data science activity that concentrates on spotting significant patterns based on objective standards. It has proven valuable in various fields, including the medical domain, where analyzing datasets can uncover high-utility itemsets related to mortality rates and causes of death.Categories and Subject Descriptors - [Health Database Applica- tion] Data Mining
  • A Comparative Analysis of the Evolution of DNA Sequencing Techniques along with the Accuracy Prediction of a Sample DNA Sequence Dataset using Machine Learning

    Mohammed K.B., Boyapati S.V., Kandimalla M.D., Kavati M.B., Saleti S.

    Conference paper, 2023 2nd International Conference on Paradigm Shifts in Communications Embedded Systems, Machine Learning and Signal Processing, PCEMS 2023, 2023, DOI Link

    View abstract ⏷

    DNA is widely considered the blueprint of life. The instructions required for all life forms, to evolve, breed, and thrive are found in DNA. Deoxyribonucleic acid, more commonly known as DNA, is among the most essential chemicals in living cells. A biological macro-molecule is DNA, also known as deoxyri-bonucleic acid. Life's blueprint is encoded by it. Sequencing of DNA has exponentially progressed due to the immense increase in data production in today's world. By means of this paper, we intend to evaluate the evolution of DNA Sequencing methods and perform a comparative analysis of modern-day DNA sequencing techniques to the ones of the past. We also illuminate the potential of machine learning in this domain by taking an exploratory and predicting the DNA Sequence using a Multinomial Naive Bayes classifier.
  • A Comparison of Various Class Balancing and Dimensionality Reduction Techniques on Customer Churn Prediction

    Bhushan Mada S.P., Thimmireddygari N., Tata R., Thondapu S.R., Saleti S.

    Conference paper, 7th IEEE International Conference on Recent Advances and Innovations in Engineering, ICRAIE 2022 - Proceedings, 2022, DOI Link

    View abstract ⏷

    With the advancement of technology, companies are able to foresee the customers who are going to leave their organization much before. This problem of customer churn prediction is handled in the current work. In the real-world, data is not balanced, having more observations with respect to few classes and less observations in case of other classes. But, giving equal importance to each class is really significant to build an efficient prediction model. Moreover, real-world data contains many attributes meaning that the dimensionality is high. In the current paper, we discussed three data balancing techniques and two methods of dimensionality reduction i.e. feature selection and feature extraction. Further, selecting the best machine learning model for churn prediction is an important issue. This has been dealt in the current paper. Also, we aim to improve the efficiency of customer churn prediction by evaluating various class balancing and dimensionality reduction techniques. Moreover, we evaluated the performance of the models using AUC curves and K-fold cross-validation techniques.
  • Ontology Based Food Recommendation

    Chivukula R., Lakshmi T.J., Sumalatha S., Reddy K.L.R.

    Conference paper, Smart Innovation, Systems and Technologies, 2022, DOI Link

    View abstract ⏷

    Eating right is the most crucial aspect of healthy living. A nutritious, balanced diet keeps our bodies support fight off diseases. Many lifestyle related diseases such as diabetes and thyroid can often be avoided by active living and better nutrition. Having diet related knowledge is essential for all. With this motivation, an ontology related to food domain is discussed and developed in this work. The aim of this work is to create on ontology model in the food domain to help people in getting right recommendation about the food, based on their health conditions if any.
  • Mining High Utility Time Interval Sequences Using MapReduce Approach: Multiple Utility Framework

    Saleti S., Lakshmi T.J., Ahmad M.W.

    Article, IEEE Access, 2022, DOI Link

    View abstract ⏷

    Mining high utility sequential patterns is observed to be a significant research in data mining. Several methods mine the sequential patterns while taking utility values into consideration. The patterns of this type can determine the order in which items were purchased, but not the time interval between them. The time interval among items is important for predicting the most useful real-world circumstances, including retail market basket data analysis, stock market fluctuations, DNA sequence analysis, and so on. There are a very few algorithms for mining sequential patterns those consider both the utility and time interval. However, they assume the same threshold for each item, maintaining the same unit profit. Moreover, with the rapid growth in data, the traditional algorithms cannot handle the big data and are not scalable. To handle this problem, we propose a distributed three phase MapReduce framework that considers multiple utilities and suitable for handling big data. The time constraints are pushed into the algorithm instead of pre-defined intervals. Also, the proposed upper bound minimizes the number of candidate patterns during the mining process. The approach has been tested and the experimental results show its efficiency in terms of run time, memory utilization, and scalability.
  • Incremental mining of high utility sequential patterns using MapReduce paradigm

    Saleti S.

    Article, Cluster Computing, 2022, DOI Link

    View abstract ⏷

    High utility sequential pattern (HUSP) mining considers the nonbinary frequency values of items purchased in a transaction and the utility of each item. Incremental updates are very common in many real-world applications. Mining the high utility sequences by rerunning the algorithm every time when the data grows is not a simple task. Moreover, the centralized algorithms for mining HUSPs incrementally cannot handle big data. Hence, an incremental algorithm for high utility sequential pattern mining using MapReduce para-digm (MR-INCHUSP) has been introduced in this paper. The proposed algorithm includes the backward mining strategy that profoundly handles the knowledge acquired from the past mining results. Further, elicited from the co-occurrence relation between the items, novel sequence extension rules have been introduced to increase the speed of the mining process. The experimental results exhibit the performance of MR-INCHUSP on several real and synthetic datasets.
  • Constraint Pushing Multi-threshold Framework for High Utility Time Interval Sequential Pattern Mining

    Saleti S., Naga Sahithya N., Rasagna K., Hemalatha K., SaiCharan B., Karthik Upendra P.V.

    Conference paper, Communications in Computer and Information Science, 2022, DOI Link

    View abstract ⏷

    This paper aims to detect high utility sequential patterns including time intervals and multiple utility thresholds. There are many algorithms that mine sequential patterns considering utility factor, these can find the order between the items purchased but they exclude the time interval among items. Further, they consider only the same utility threshold for each item present in the dataset, which is not convincing to assign equal importance for all the items. Time interval of items plays a vital role to forecast the most valuable real-world situations like retail sector, market basket data analysis etc. Recently, UIPrefixSpan algorithm has been introduced to mine the sequential patterns including utility and time intervals. Nevertheless, it considers only a single minimum utility threshold assuming the same unit profit for each item. Hence, to solve the aforementioned issues, in the current work, we proposed UIPrefixSpan-MMU algorithm by utilizing a pattern growth approach and four time constraints. The experiments done on real datasets prove that UIPrefixSpan-MMU is more efficient and linearly scalable for generating the time interval sequences with high utility.
  • Exploring Patterns and Correlations Between Cryptocurrencies and Forecasting Crypto Prices Using Influential Tweets

    Kumar M., Priya G.S., Gadipudi P., Agarwal I., Sumalatha S.

    Conference paper, Communications in Computer and Information Science, 2022, DOI Link

    View abstract ⏷

    The Crypto market, as we know, is a market full of various kinds of investors and influencers. We all know the pizza incident in 2010 where a guy purchased two pizzas at 10000 BTC, which ranges nearly around 80 million in current times. That describes how much the market has progressed in these 10–12 years. You can see drastic changes in the price of several coins in the past few years, which brings in many new investors to invest their money in this market. Crypto Market has highly volatile currencies. Bitcoin was around 5K INR in 2013, and by year 2021, it reached 48 Lakhs INR, which shows how volatile the market is. The dataset provides many fascinating and valuable insights that help us gather practical knowledge. As data scientists, we are very keen to understand such a market whose data is unstable and keeps changing frequently and making out new patterns with time. This introduction of new patterns with time makes this problem an interesting one and keeps on motivating us to find some valuable information. So, through this manuscript, we tried to analyze two specific crypto coins for a particular period, including more than 2900 records. We found several interesting patterns in the dataset and explored the historical return using several statistical models. We plotted the opening and closing prices of the particular coin by using NumPy, SciPy, and Matplotlib. We also tried to make predictions of the cost of the specific currency and then plot the predicted price line with the actual price line and understand the difference in the prediction model with the fundamental price mode. To do so, we used the Simple Exponential Smoothing (SES) model and performed sentiment analysis based on influencing tweets on Twitter. That makes our prediction more accurate and more reliable than existing techniques. Lastly, we used a linear regression model to establish the relationship between the returns of Ripple and Bitcoin.
  • Mining Spatio-Temporal Sequential Patterns Using MapReduce Approach

    Saleti S., RadhaKrishna P., JaswanthReddy D.

    Conference paper, Communications in Computer and Information Science, 2022, DOI Link

    View abstract ⏷

    Spatio-temporal sequential pattern mining (STSPM) plays an important role in many applications such as mobile health, criminology, social media, solar events, transportation, etc. Most of the current studies assume the data is located in a centralized database on which a single machine performs mining. Thus, the existing centralized algorithms are not suitable for the big data environment, where data cannot be handled by a single machine. In this paper, our main aim is to find out the Spatio-temporal sequential patterns from the event data set using a distributed framework suitable for mining big data. We proposed two distributed algorithms, namely, MR-STBFM (MapReduce based spatio-temporal breadth first miner), and MR-SPTreeSTBFM (MapReduce based sequential pattern tree spatio-temporal breadth first miner). These are the distributed algorithms for mining spatio-temporal sequential patterns using Hadoop MapReduce framework. A spatio-temporal tree structure is used in MR-SPTreeSTBFM for reducing the candidate generation cost. This is an extension to the proposed MR-STBFM algorithm. The tree structure significantly improves the performance of the proposed approach. Also, the top-most significant pattern approach has been proposed to mine the top-most significant sequential patterns. Experiments are conducted to evaluate the performance of the proposed algorithms on the Boston crime dataset.
  • Student Placement Chance Prediction Model using Machine Learning Techniques

    Manike M., Singh P., Madala P.S., Varghese S.A., Sumalatha S.

    Conference paper, 2021 5th Conference on Information and Communication Technology, CICT 2021, 2021, DOI Link

    View abstract ⏷

    Obtaining employment upon graduation from uni-versity is one of the highest, if not the highest, priorities for students and young adults. Developing a system that can help these individuals obtain placement advice, analyze labor market trends, and assist educational institutions in assessing growing fields and opportunities would serve immense value. With the emergence of heavily refined Data Mining techniques and Machine Learning boiler plates, a model based on predictive analysis can help estimate a variety of realistic and possible placement metrics, such as the types of companies a junior year student can be placed in, or the companies that are likely to look for the specific skill sets of a student. Various attributes such as academic results, technical skills, training experiences, and projects can help predict purposes. We devised the XGBoost Technique, a structured or tabular data-focused approach that has recently dominated applied machine learning and Kaggle tournaments. XGBoost is a high-speed and high-performance implementation of gradient boosted decision trees. We created a model and ran numerous EDAs to determine whether the student will be placed or not, as well as in which type of organization he will be placed [Day Sharing, Dream, Super Dream, Marquee].
  • Distributed Mining of High Utility Sequential Patterns with Negative Item Values

    Varma M., Sumalatha S., reddy A.

    Article, International Journal of Advanced Computer Science and Applications, 2021, DOI Link

    View abstract ⏷

    The sequential pattern mining was widely used to solve various business problems, including frequent user click pattern, customer analysis of buying product, gene microarray data analysis, etc. Many studies were going on these pattern mining to extract insightful data. All the studies were mostly concentrated on high utility sequential pattern mining (HUSP) with positive values without a distributed approach. All the existing solutions are centralized which incurs greater computation and communication costs. In this paper, we introduce a novel algorithm for mining HUSPs including negative item values in support of a distributed approach. We use the Hadoop map reduce algorithms for processing the data in parallel. Various pruning techniques have been proposed to minimize the search space in a distributed environment, thus reducing the expense of processing. To our understanding, no algorithm was proposed to mine High Utility Sequential Patterns with negative item values in a distributed environment. So, we design a novel algorithm called DHUSP-N (Distributed High Utility Sequential Pattern mining with Negative values). DHUSP-N can mine high utility sequential patterns considering the negative item utilities from Bigdata.
  • Distributed Mining of High Utility Time Interval Sequential Patterns with Multiple Minimum Utility Thresholds

    Saleti S., Tangirala J.L., Thirumalaisamy R.

    Conference paper, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2021, DOI Link

    View abstract ⏷

    In this paper, the problem of mining high utility time interval sequential patterns with multiple utility thresholds in a distributed environment is considered. Mining high utility sequential patterns (HUSP) is an emerging issue and the existing HUSP algorithms can mine the order of items and they do not consider the time interval between the successive items. In real-world applications, time interval patterns provide more useful information than the conventional HUSPs. Recently, we proposed distributed high utility time interval sequential pattern mining (DHUTISP) algorithm using MapReduce in support of the BigData environment. The algorithm has been designed considering a single minimum utility threshold. It is not convincing to use the same utility threshold for all the items in the sequence, which means that all the items are given the same importance. Hence, in this paper, a new distributed framework is proposed to efficiently mine high utility time interval sequential patterns with multiple minimum utility thresholds (DHUTISP-MMU) using the MapReduce approach. The experimental results show that the proposed approach can efficiently mine HUTISPs with multiple minimum utility thresholds.
  • Distributed mining of high utility time interval sequential patterns using mapreduce approach

    Sumalatha S., Subramanyam R.B.V.

    Article, Expert Systems with Applications, 2020, DOI Link

    View abstract ⏷

    High Utility Sequential Pattern mining (HUSP) algorithms aim to find all the high utility sequences from a sequence database. Due to the large explosion of data, recently few distributed algorithms have been designed for mining HUSPs based on the MapReduce framework. However, the existing HUSP algorithms such as USpan, HUS-Span and BigHUSP are able to predict only the order of items, they do not predict the time between the items, that is, they do not include the time intervals between the successive items. But in a real-world scenario, time interval patterns provide more valuable information than conventional high utility sequential patterns. Therefore, we propose a distributed high utility time interval sequential pattern mining (DHUTISP) algorithm using the MapReduce approach that is suitable for big data. DHUTISP creates a novel time interval utility linked list data structure (TIUL) to efficiently calculate the utility of the resulting patterns. Moreover, two utility upper bounds, namely, remaining utility upper bound (RUUB) and co-occurrence utility upper bound (CUUB) are proposed to prune the unpromising candidates. We conducted various experiments to prove the efficiency of the proposed algorithm over both the distributed and non-distributed approaches. The experimental results show the efficiency of DHUTISP over state-of-the-art algorithms, namely, BigHUSP, AHUS-P, PUSOM and UTMining_A.
  • A MapReduce solution for incremental mining of sequential patterns from big data

    Saleti S., R.B.V. S.

    Article, Expert Systems with Applications, 2019, DOI Link

    View abstract ⏷

    Sequential Pattern Mining (SPM) is a popular data mining task with broad applications. With the advent of big data, traditional SPM algorithms are not scalable. Hence, many of the researchers have migrated to big data frameworks such as MapReduce and proposed distributed algorithms. However, the existing MapReduce algorithms assume the data as static and do not handle the incremental database updates. Moreover, they use to re-mine the updated database while new sequences are inserted. In this paper, we propose an efficient distributed algorithm for incremental sequential pattern mining (MR-INCSPM) using the MapReduce framework that can handle big data. The proposed algorithm incorporates the backward mining approach that efficiently makes use of the knowledge obtained during the previous mining process. Also, based on the study of item co-occurrences, we propose Co-occurrence Reverse Map (CRMAP) data structure. The issue of combinatorial explosion of candidate sequences is dealt using the proposed CRMAP data structure. Besides, a novel candidate generation and early prune mechanisms are designed using CRMAP to speed up the mining process. The proposed algorithm is evaluated on both the real and synthetic datasets. The experimental results prove the efficacy of MR-INCSPM with respect to processing time, memory and pruning efficiency.
  • A novel mapreduce algorithm for distributed mining of sequential patterns using co-occurrence information

    Saleti S., Subramanyam R.B.V.

    Article, Applied Intelligence, 2019, DOI Link

    View abstract ⏷

    Sequential Pattern Mining (SPM) problem is much studied and extended in several directions. With the tremendous growth in the size of datasets, traditional algorithms are not scalable. In order to solve the scalability issue, recently few researchers have developed distributed algorithms based on MapReduce. However, the existing MapReduce algorithms require multiple rounds of MapReduce, which increases communication and scheduling overhead. Also, they do not address the issue of handling long sequences. They generate huge number of candidate sequences that do not appear in the input database and increases the search space. This results in more number of candidate sequences for support counting. Our algorithm is a two phase MapReduce algorithm that generates the promising candidate sequences using the pruning strategies. It also reduces the search space and thus the support computation is effective. We make use of the item co-occurrence information and the proposed Sequence Index List (SIL) data structure helps in computing the support at fast. The experimental results show that the proposed algorithm has better performance over the existing MapReduce algorithms for the SPM problem.
  • A novel bit vector product algorithm for mining frequent itemsets from large datasets using mapreduce framework

    Saleti S., Subramanyam R.B.V.

    Article, Cluster Computing, 2018, DOI Link

    View abstract ⏷

    Frequent itemset mining (FIM) is an interesting sub-area of research in the field of Data Mining. With the increase in the size of datasets, conventional FIM algorithms are not suitable and efforts are made to migrate to the Big Data Frameworks for designing algorithms using MapReduce like computing paradigms.We too interested in designing MapReduce based algorithm. Initially, our Parallel Compression algorithm makes data simpler to handle. A novel bit vector data structure is proposed to maintain compressed transactions and it is formed by scanning the dataset only once. Our Bit Vector Product algorithm follows the MapReduce approach and effectively searches for frequent itemsets from a given list of transactions. The experimental results are present to prove the efficacy of our approach over some of the recent works.
Contact Details

sumalatha.s@srmap.edu.in

Scholars

Doctoral Scholars

  • Mr Ramdas Kapila
  • Ms A Sai Sunanda