Sumeet Agarwal: Research

[Home][Bio][Students][Research][Publications][Reports and Talks][Teaching][Miscellanea][Blog]

"The scientific understanding of life and mind in terms of information, computation, and feedback control."

(Steven Pinker's [1] nice summarisation of the overarching research programme that I aspire to make some minute contributions to.)

I have divided this page into two parts. The first part aims to provide an overview of the broad areas or themes that are primarily of interest to me, and is hopefully written at a level so as to be comprehensible even to those with very different backgrounds. I have not actually worked in all of these areas myself; in some cases I am just beginning to explore possibilities. The second part includes brief descriptions of the specific research projects I am currently working on or intending to work on in the near future.
Should anything here catch your fancy, do drop me a line and I would be very glad to discuss in more detail and to know your thoughts.

Note to potential students/postdocs: We have some links with Imperial College London, as part of the Imperial College – India Biomathematics Bridge, and there may be possibilities for joint appointments [postdoc advertisement]. Please get in touch if interested.

Areas/Themes

Pattern Recognition and Machine Learning
What is it that characterises human intelligence? Arguably a key part of the answer is the ability to recognise patterns [2] in the world around us, and to attach meaning to them. Artificial intelligence (AI) is a branch of computer science which seeks to program computers to be able to replicate intelligent human behaviour; its origins date back to Turing's landmark 1950 paper [3], where he proposed that a computer could only be regarded as truly intelligent if it could not be distinguished from a human being in a conversation with another human being, a notion now famous as the Turing test. Early efforts at AI focused on attempting to formalise human reasoning, and specifying sets of rules which a machine could follow to replicate intelligent human behaviour. However, this approach soon ran into problems, as the number of rules needed began to proliferate beyond the feasible limit for tasks of even moderate complexity. Notably, in 1973, Sir James Lighthill's report for the British government concluded that the AI programme had essentially been a failure, and led to end of support for AI research at the vast majority of British universities.
One of the lessons drawn from these early troubles was that we needed to better understand how human intelligence works, and make use of that. Evidently, we humans don't require endless sets of rules for performing complex tasks; we learn from experience. Children learn to walk, and speak, and read, and recognise faces; all very hard tasks from a computational perspective, tasks which no machines or robots to date have achieved entirely successfully. But children don't need to be explicitly told rules or algorithms for any of these tasks; they just learn automatically, by observation and trial and error! In other words, our brains come equipped with some sort of abstract framework for making sense of the world around us, and we are able to fill in the details via learning from experience (specifically for linguistics, this corresponds to the division into principles and parameters). Machine learning is the name given to the area of study that seeks to equip computers with the same capability.
The key to machine learning is to think of the world in terms of adaptive mathematical models. The nature of the model provides a framework, and tunable parameters within the model allow for adaptation to observations. Most prominent machine learning approaches are statistical in nature, i.e., they seek to detect statistical patterns in the observed data and exploit these for learning. Some types of models, such as neural networks, have been directly inspired by what we know of our own brains, but there exist a wide variety of methods that have proven useful for particular kinds of tasks. One side of machine learning is very practical: it's about trying to efficiently automate useful tasks to free up human time and resources. However, there is also a grander scientific goal: to better understand the nature of intelligence (both human and computational), as well as its frontiers.

Machine learning also has strong and fascinating connections with key issues in the philosophy of science [4][5][6]; in particular, the problem of induction, which has roots in ancient Indian and Greek philosophy, and is most famously associated with David Hume [7]. I am greatly interested in exploring and better understanding these connections.

Cognitive Science
I am specifically interested in understanding how the human mind processes linguistic and visual information, and the kinds of representational and computational processes involved therein. We seek to use machine learning models to better understand, or 'reverse engineer', the learning and cognition capabilities of the human mind in these domains.

Computational Linguistics

Complex Networks

Systems Biology

Evolution and Evolvability

Darwinian Literary Studies

Public Health Informatics

Current Primary Research Topics and Questions

Note that the common thread amongst all of these is machine learning; my essential interest is in the application and development of machine learning techniques to help us scientifically model, simulate, and understand all of these complex natural systems – how they process information, learn from experience, and make decisions.

Cognitive Science

This is a very broad area, concerned with understanding the human mind, intelligence, and cognition: typically in computational terms. I work primarily on the computational modelling of human language processing (i.e., computational psycholinguistics, an area that is related to natural language processing or NLP but focuses more on understanding mental mechanisms). The kinds of questions we seek to address are to do with how the mind produces and comprehends language; for instance:

Why is it that certain ways of saying or writing something (in particular, of ordering the words in a sentence) are easier to produce/understand than other ways, even when the meaning remains unchanged? [8]
What might this tell us about the cognitive constraints on language processing? [9][10][11]
In what ways might prevalent language models in NLP, such as those based on RNNs or Transformers, be able to capture aspects of human language learning and cognition? [12][13]

We have a particular focus on Indo-Aryan languages such as Hindi-Urdu.
Another topic of interest recently has been to look at how humans perceive and cognise AI systems and technologies, and what factors shape human users' trust (or lack thereof) in the use of machine intelligence in different application domains:

To what extent does people's perceived understanding or knowledge of the workings of AI algorithms influence their attitudes towards such systems? And what shapes the sense of understanding itself: is it more individual or more collective? [14][15]
How does human acceptance of intelligent machines relate to the perceived moral dimensions of the behaviour of those machines? How can we model and understand human moral reasoning in the context of human-machine interactions? [16][17]

Systems and Evolutionary Biology

Biology is becoming increasingly quantitative and driven by computational modelling [18]. Systems biology refers to an approach that seeks to model biological systems holistically, with a particular emphasis on the interactions between different components of a biological system and the kind of behaviour/functionality that emerges from such interaction. In pursuing such modelling, there turns out to be a very important role for a wide range of concepts and tools developed in EE/CS: looking at such systems as information/signal-processing systems, which employ complex biological circuitry for the purposes of computation and control of many processes essential to life (e.g., [19]). My focus is primarily on looking at systems at the subcellular level, i.e., within single cells. At this level the key players are proteins, which are the essential building blocks and workhorses of all living cells.

How can we model and simulate the ways in which thousands of genes and proteins, acting in concert, give rise to the tremendous variety of functions and responses that each cell is capable of?
How can we 'reverse engineer' the internal circuitry of cells from observed experimental data on cell physiology, i.e., all the ways in which cells change their state through time and respond and react to environmental stimuli? [20][21]
Can such models help us to better understand what happens when cells start malfunctioning (e.g., cancerous cells), and thus develop better targeted drugs or therapies?
How can we formally model and understand the mechanisms and constraints governing the process via which these systems developed in the first place, i.e., evolution? [22][23]
Can evolution be seen as a form of inter-generational learning from experience, and hence formalised and studied using ideas from machine learning? [24]

Computational Social Science

This is a newer and less well-defined area than the others; nevertheless, it is becoming increasingly exciting for its scope and potential to improve human lives. The 'complex system' involved here is even more complex than for the above areas: it is now an entire society that we seek to, at some level, model and understand! We are particularly interested in public health. A key idea of public health is to look at health as a social phenomenon: something which depends not just on individual genes or circumstances or choices or behaviour, but something which is strongly influenced by the social environment around us.

One specific aspect we are working on involves immunisation behaviours and attitudes: what kinds of socioeconomic factors influence our uptake of and level of trust in vaccines, and how can we model and forecast this, especially at a local level? [25][26]
Vaccine refusal is a major public health problem worldwide, and causes many unnecessary deaths. So an ability to understand and mitigate the factors that lead to vaccine hesitancy could lead to major gains in public health – topically, in the context of the ongoing mass vaccination campaign for COVID-19. [27]
There is a also a lot of scope for using social networks to understand many public health phenomena [28], especially now that large amounts of social network data are potentially available from a range of sources: Facebook/Twitter, mobile phone network data, etc.
Such data are increasingly being used to understand social demographics [29], many of which are likely to be relevant to health outcomes.
Remote sensing data from satellites has also been used for similar ends [30], and hence such projects could also involve an image processing or computer vision component. Such data can be used to estimate a number of useful socioeconomic indices, such as population and development metrics, at more fine-grained spatial resolutions than would be available from standard census or survey data.

Further information on some ongoing/past projects

Uncovering Subcellular Regulatory Networks
(DST-funded, under the INSPIRE Faculty Fellowship.)
The post-genomic era has produced masses of data on subcellular biological systems: gene expression microarrays, protein-protein interactions, and metabolic pathways. A key challenge is to leverage this data to gain functional understanding of the underlying systems and mechanisms. Network biology seeks to do this by using the mathematical abstraction of a graph to represent systems comprising many interacting components. One question relevant to the study of these networks is whether we can identify typical structural 'signatures'; such signatures can guide the inference of networks from data. I would like to investigate this possibility by examining the structure of such networks from multiple different perspectives, and attempting to detect patterns of interest in an automated, data-driven fashion.
Another question is whether we can develop richer models, by integrating biological interactions at multiple levels. Our picture of gene control and regulation has become increasingly complex, with a variety of novel data on the role of non-coding RNA and RNA interference, protein-DNA interactions, 3-D chromosome structure etc. These interactions all represent aspects of a single system, with myriad information flows between them, but there have been only a few restricted efforts to put the pieces together. I propose to work towards developing a framework which allows for modelling multiple aspects of the cellular machinery at different levels.
[Grant proposal][Slides]

Reports of work done on this project:
S. N. Karishma, Summer internship report, July 2013
Monalisa, M.Tech. thesis, May 2014
Aastha, M.Tech. thesis, May 2014
Alok Singhal, M.Tech. thesis, June 2014
Deepika Vatsa, Ph.D. proposal, March 2015
Rishabh Dudeja, B.Tech. thesis, May 2015
Ashesh, M.Tech. thesis, July 2015
Abdul Hadi Shakir, M.Tech. thesis, July 2015
Deepali Jain and Suchakra Sah, B.Tech. thesis, November 2015
Sahil Loomba and Parul Jain, B.Tech. thesis, August 2016
Tarun Mahajan, M.S.(R) thesis, December 2017
Abhishek Pathak, Independent Study report, May 2018
Shruti Kaushal, M.Sc. thesis, August 2018

Monitoring Public Confidence in Immunisation Programmes
(In collaboration with Heidi Larson and Alex de Figueiredo at the London School of Hygiene & Tropical Medicine, and Nick Jones at Imperial College London.)
This is in conjunction with the Vaccine Confidence Project based at the London School of Hygiene & Tropical Medicine. The idea is to use multiple information sources, such as news and social media, to monitor people's confidence level in vaccines. Vaccine refusal is actually a serious problem for public immunisation programmes across the world; and this is often a consequence of the spread of misinformation or rumours, whether accidental or malicious. Two prominent examples are the polio vaccination effort in India, which for many years faced pockets of vaccine hesitancy, especially amongst some communities in UP and Bihar, but finally achieved eradication in 2011; and the MMR vaccine in the UK, which was wrongly linked to autism by the now-discredited Andrew Wakefield.
The objective of this project is to set up an early-warning and prediction system for such episodes of loss in vaccine confidence, using a machine learning approach to model the contributions of different factors, which may include high-frequency information gleaned from news or social media (drawing on earlier work [31][32]), as well as more slowly changing variables like literacy rates and telephone penetration. One possible outcome is to obtain a kind of 'Vaccine Confidence Index' (along the lines of the Human Development Index or stock market indices) which appropriately combines the available indicators to give us an overall metric for the level of concern in a given context, which could then be used to guide early prevention and response efforts by the public health community.

Evolving Synthetic Genetic Oscillators
(In collaboration with Shaunak Sen.)
We would like to use evolutionary or genetic algorithms to obtain in silico small genetic networks that display specific oscillation dynamics (e.g., they oscillate with a specified amplitude or phase), along the lines of previous work [33][34]. Eventually, our goal is to scale this up to study and understand the evolution of larger-scale biological networks, both topology and dynamics, as well as the relationship between them. For instance, can we explain features like network modularity and motifs [35][36] in terms of their role in network dynamics?
[Abstract]

[Home][Bio][Students][Research][Publications][Reports and Talks][Teaching][Miscellanea][Blog]