As an AI/ML specialist at ISS STOXX's Technology Innovation Lab, I work on LLM infrastructure and NLP pipelines — from deploying and optimising models on internal GPU infrastructure to building systems that process ~700K news articles daily across 10+ languages for ESG controversy monitoring.
My doctoral research at IIT Bombay on Noun Compound Interpretation addressed two complementary aspects: automatic interpretation — uncovering implicit semantic relations between nouns through supervised LSTM paraphrasing and unsupervised approaches with BERT/RoBERTa and T5 — both outperforming their respective supervised baselines — and a linguistic contribution proposing FrameNet-based semantic labels, demonstrating that FrameNet data can generalize label sets and enable models to handle unseen relation labels. This work was published at ACL, EMNLP, COLING, and LREC.
I also design and deliver a company-wide "Introduction to LLM for Software Developers" workshop series (~70 participants per session) and advise internal teams on AI use-case design and implementation, contribute to open source NLP tooling (Hugging Face Transformers), and have open-sourced a vLLM monitoring dashboard built on Telegraf, InfluxDB, and Grafana.
The thesis addressed Noun Compound Interpretation (NCI): automatically uncovering implicit semantic relations between component nouns (e.g., student protest → "student is the doer of the protest"). The work spans two distinct axes:
1. Automatic interpretation: Multiple approaches of increasing sophistication — (a) supervised LSTM-based prepositional paraphrasing; (b) unsupervised fill-in-the-blank using BERT/RoBERTa; (c) unsupervised free-form paraphrasing using T5 — the unsupervised models outperformed prior supervised baselines.
2. Linguistic contribution: Proposed FrameNet-based semantic relation labels for noun compounds, and demonstrated that FrameNet data can be leveraged to generalize the label space — enabling interpretation models to handle unseen relation labels via zero-shot learning (ConvE embeddings over FrameNet).
Relevant courses: Foundation of ML, NLP and the Web, Artificial Intelligence, Topics in NLP (Machine Translation).
Research originated during a 2014 TRDDC internship. Internship mentor Girish K. Palshikar became Ph.D. co-supervisor — a direct intellectual lineage from internship to completed PhD.
Addressed the Q-MER (Query Maximal Empty Rectangle) problem in external memory — finding the largest axis-parallel empty rectangle around a query point among massive point sets that exceed main memory.
Contributions: (1) Proposed algorithm for an external priority search tree with linear space in log-linear time; (2) Showed all standard range queries can be answered efficiently; (3) Algorithm to solve MER with O(log n) extra space in main memory.
Relevant topics: Probabilistic Reasoning · Pattern Recognition · Algorithms & Complexity · Optimisation Techniques · Automata & Formal Languages
Open-source Telegraf → InfluxDB 2.0 → Grafana monitoring stack for vLLM inference servers. Monitors TTFT, throughput, KV-cache utilisation, HTTP responses, and Python GC.
Bug reports (#4121, #4021), T5 pipeline solution (#3985), code review (PR #4367), and documentation commits.
Other contributions: keras_lr_finder · Streamlit · BetterBib · framenet-tools · ProbCog · NLP-progress (proposed noun compound task)