Selected Publications
TigerLLM - A Family of Bangla Large Language Models
Nishat Raihan, Marcos Zampieri
ACL 2025 (Main Conference)
The first and state-of-the-art family of Bangla LLMs.
mHumanEval - A Multilingual Benchmark to Evaluate Large Language Models for Code Generation
Nishat Raihan, Antonios Anastasopoulos, Marcos Zampieri
NAACL 2025 (Main Conference)
A massively multilingual (204 languages) benchmark for evaluating Large Language Models' code generation capabilities.
MojoBench: Language Modeling and Benchmarks for Mojo
Nishat Raihan, Joanna C. S. Santos, Marcos Zampieri
NAACL 2025 (Findings)
A complete LLM framework including corpus, instruction-dataset, and a model that achieves state-of-the-art performance in Mojo code generation.
LLMs in CS Education: A Systematic Literature Review
Nishat Raihan, Mohammed Latif Siddiq. Joanna C. S. Santos, Marcos Zampieri
SIGCSE-TS 2025
A systematic literature review examining 125 papers on how LLMs are used and perceived in computer science education.
Code LLMs: A Taxonomy-based Survey
Author names here
IEEE Big Data 2024
A comprehensive taxonomy-based survey on Code LLMs, covering evaluation benchmarks, corpora, limitations, and open problems.
On the performance of large language models on introductory programming assignments
Nishat Raihan, Dhiman Goswami, Sadiya Sayara Chowdhury Puspo, Mohammed Latif Siddiq, Christian Newman, Tharindu Ranasinghe, Joanna Santos, Marcos Zampieri
Journal of Intelligent Information Systems - Springer
A framework with hundreds of programming prompts and MCQs from introductory CS courses, evaluating LLM performance in Python code generation and CS fundamentals.
Mentalhelp: A multi-task dataset for mental health in social media
Nishat Raihan, Sadiya Sayara Chowdhury Puspo, Shafkat Farabi, Ana-Maria Bucur, Tharindu Ranasinghe, Marcos Zampieri
LREC-COLING 2024
A comprehensive dataset for mental health support research.