About me
I am a Ph.D. candidate in Computer Science at UC Irvine, specializing in trustworthy AI and machine learning. My research focuses on enhancing the security and robustness of large language models (LLMs), detecting social engineering threats, and advancing multimodal AI systems. I've developed methods to defend AI models against adversarial "jailbreak" attacks and techniques to identify phone-based scams, with my work published in top conferences and journals, accumulating over 200 citations.
Beyond academia, I have industry experience as a data science intern, applying machine learning to solve practical challenges. Passionate about teaching and mentorship, I have guided graduate students through machine learning and deep learning courses at UCI. Additionally, I actively contribute to the research community as a reviewer for respected journals, championing the ethical deployment of AI. I aim to leverage this blend of research excellence, industry insight, and teaching experience to drive innovation and ethical practices in AI.
Areas of Expertise
-
Programming
Professional programming in Python, SQL, MATLAB, Java, C/C++, with expertise in AI/ML frameworks.
-
Data Analytics
Advanced pipeline development: Web scraping, feature engineering, statistical analysis, and visualizations.
-
Machine Learning & AI
Advanced model development with PyTorch/TensorFlow: Deep learning, adversarial defenses, and multimodal system optimization.
-
Cybersecurity & Safety in AI
Focused on AI safety, including adversarial defenses for LLMs and detecting social engineering attacks.

Credentials
Education
-
University of California, Irvine
2019 — PresentPh.D. Candidate in Computer Science
Research Focus: ML and AI Trustworthiness in NLP, LLMs, VLMs, and Multi-Modal Models, emphasizing alignment, safety, and reliability. -
University of California, Irvine
2019 — 2023M.Sc. in Computer Science, GPA: 3.98/4.0
Completed course-based Master's degree during Ph.D. -
Sharif University of Technology
2014 — 2017M.Sc. in Computer Engineering, GPA: 3.9/4.0
Specialization: Artificial Intelligence and Robotics
Thesis: "Analyzing Purchase Satisfaction Using Opinion Mining" -
K.N. Toosi University of Technology
2009 — 2014B.Sc. in Computer Engineering - Hardware, GPA: 3.52/4.0
Thesis: "Text Summarization Using LSA and NMF"
Experience
-
Research Assistant - Secure Systems and Software Laboratory
2019 — PresentUniversity of California, Irvine (Prof. Ian Harris)
• Developed Adversarial Prompt Shield (APS) to defend against jailbreaking attacks on LLMs
• Created novel machine learning approaches for detecting telephone-based social engineering attacks
• Led groundbreaking human studies on telephone scams with 186 participants
• Conducted NSF-funded research on detecting social engineering attacks -
Research Assistant
2017 — 2019Sharif University of Technology (Prof. Hamid Beigy)
• Conducted innovative research on opinion mining techniques to analyze customer purchase satisfaction
• Leveraged advanced machine learning for large-scale social media analysis
• Completed thesis on analyzing purchase satisfaction using opinion mining -
Data Scientist Intern
Summers 2014 & 2016DigiKala
• Developed innovative techniques for visualizing large-scale networks
• Conducted analysis on purchase satisfaction using comment data
• Contributed to improved user experience and product recommendations
Publications
-
Robust Safety Classifier Against Jailbreaking Attacks: Adversarial Prompt Shield
Jinhwa Kim, Ali Derakhshan, Ian G. Harris. Proceedings of the 8th Workshop on Online Abuse and Harms (WOAH) at NAACL, 2024. -
Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield
Jinhwa Kim, Ali Derakhshan, Ian G. Harris. arXiv preprint, 2023. -
Mitra Behzadi at SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification Method Based on CLIP
Mitra Behzadi, Ali Derakhshan, Ian G. Harris. Proceedings of the 16th International Workshop on Semantic Evaluation, 2022. -
Detecting Telephone-Based Social Engineering Attacks Using Scam Signatures
Ali Derakhshan, Ian G. Harris, Mitra Behzadi. Proceedings of the ACM Workshop on Security and Privacy Analytics, 2021. -
Rapid Cyber-Bullying Detection Method Using Compact BERT Models
Mitra Behzadi, Ian G. Harris, Ali Derakhshan. IEEE 15th International Conference on Semantic Computing (ICSC), 2021. -
A Study of Targeted Telephone Scams Involving Live Attackers
Ian G. Harris, Ali Derakhshan, Marcel Carlsson. International Workshop on Socio-Technical Aspects in Security and Trust, 2020. -
Sentiment Analysis on Stock Social Media for Stock Price Movement Prediction
Ali Derakhshan, Hamid Beigy. Engineering Applications of Artificial Intelligence, 2019.
Peer-Reviews:
-
TDSC Reviewer – Transactions on Dependable and Secure Computing (TDSC), 2023
-
Complexity Reviewer – Complexity Journal (COMPLEXiTY), 2022
-
SFI Reviewer – Springer Open Financial Innovation (SFI), 2021
-
IJAMCS Reviewer – International Journal of Applied Mathematics and Computer Science (IJAMCS), 2021