Research Summary.
Personal robots and other embodied assistants (e.g. self-driving vehicles, smart home systems) are largely intended to provide intuitive assistance to people in their daily lives, yet we cannot program all task intelligence they will need a priori. Thus in order to provide the desired assistance, these agents need the ability to dynamically acquire and expand their task domain knowledge. More specifically, one key recurring challenge occurs when the robotic agent is given a high-level task, described by an abstract task plan. The robot must first perceptually ground each entity and concept within the recipe (e.g. items, locations) in order to perform the task. An example of this may be that in order to learn to serve cooked pasta in a home, a robot must first ground concepts like cooking pot, stove, and bottle of pasta sauce. Assuming no prior knowledge, this is particularly challenging in newly situated or non-stationary environments, where the robot has limited representative training data. This thread of research examines the problem of enabling a social robotic agent to employ interaction with a human partner for efficiently learning to ground task-relevant concepts in its environment.
Initial Work (Passive Learning from Demonstration).
Our initial work investigated Learning from Demonstration (LfD) approaches for the acquisition of (1) training instances as examples of task-relevant concepts and (2) informative features for appropriately representing and discriminating between task-relevant concepts. The goal of this project was to leverage interaction with humans for enabling sample-efficient concept grounding. Accordingly, our findings validated the usefulness of exploiting user domain knowledge in this problem setting. However, they also relied upon the ability of human partners to both be proficient at teaching and track a robot’s knowledge over time, so as to know what new information to provide and when. This is because the agent played the role of passive observer, as is typically assumed in LfD settings. Such a cognitive load however is an unreasonable burden to place on users, particularly the expectation of tracking a robot’s knowledge over time, in a non-stationary environment (as is the case with real-world settings).
Relevant Publications.
Human-Driven Feature Selection for a Robotic Agent Learning Classification Tasks from Demonstration (ICRA, 2018) [PDF]
Grounding Action Parameters from Demonstration (RO-MAN, 2016) [PDF]
Later Work (Active Learning).
In later work, we contributed a general decision-theoretic active learning framework (left) that enables a learner to autonomously manage interaction with a human partner. The agent infers both when to request help and what type of information to query, based on its expectation of learning progress. We extended this framework to additionally optimize for the time and cognitive load constraints of the teacher, within the agent objective function. Overall, this later body of work gives rise to a richer communication and more flexible learning mechanism, where an agent can both (a) initiate different types of communication actions with a teacher and (b) adapt to the teacher’s time and availability constraints. It is inspired by the expressive decision-making capabilities of human learners.
Relevant Publications.
Active Learning within Constrained Environments through Imitation of an Expert Questioner (IJCAI, 2019) [PDF]
Towards Intelligent Arbitration of Diverse Active Learning Queries (IROS, 2018) [PDF]
Relevant Talks.
Active Learning in Realistic Human Settings (ICML Workshop on Human in the Loop Learning, 2020) [Invited Talk — 30 min]
Active Learning (Project II).
My later work explored strategies for enabling a social robot learner to autonomously manage its own learning interactions with a human teacher, towards actively gathering diverse types of task knowledge. In this thread of work, the agent no longer plays the role of passive observer; it becomes an active questioner. Assuming no prior knowledge, the learning agent is given a task (e.g. serving pasta) and with it, task relevant concepts (e.g. cooking pot, pasta sauce) it must perceptually ground, in order to later recognize instances of these concepts in the situated environment and use them to perform the task. The agent learns to ground all task-relevant concepts by actively querying its human partner for relevant information. I led two investigations for the project on managing interaction with a human teacher, both described below.
Active Learning of Grounded Concepts using Diverse Types of Learning Queries
The first investigation contributed two classes of reasoning approaches for arbitrating between diverse types of active learning queries, with the goal of autonomously gathering both representative examples of the concepts and informative features for discriminating between the concepts [Bullard et al, IROS 2018]. In this work, we examined both rule-based strategies and a decision-theoretic reasoning framework for selecting between multiple learning queries of different types, at each turn in a longer horizon learning episode. Importantly, the decision-theoretic framework also enables the agent to reason about when not to make queries, so as to minimize acquiring redundant information when the environment is not changing as well as minimize unnecessarily disturbing the teacher. The video below shows a demo of the robot using the decision-theoretic framework for inferring when/what to query.
The learning problem is to infer a mapping between a set of abstract general concepts (given a priori) and how they are perceptually grounded in the agent’s environment. In the demo below, the concepts are relevant to a Pack Lunchbox task. The agent must learn from the teacher (myself, as it were :) different ways of grounding the concepts (1) main dish, (2) fruit, (3) snack, and (4) beverage, with the constraint that all examples should be appropriate for the lunch-packing task.
Active Learning in Learning Environments with Externally Imposed Constraints
Typical Active Learning approaches focus on optimal query selection without considering the learning context in which the agent is situated. An important aspect of the interactive learning problem however (settings where the agent must rely on a human teacher, often colocated in the same physical space), is how the learner should integrate reasoning about the teacher’s constraints (e.g. time frame teacher will be present, cognitive load for answering questions). This second investigation builds upon the previous work (using the decision-theoretic framework and problem setting above) and examines the question of how to enable an active learner to reason about its learning objectives while concurrently considering time and query budget constraints, assumed to be implicitly given by the teacher. In particular, we formulated the problem as a joint optimization of the agent’s internal learning objectives and externally imposed constraints, all quantified within the agent’s objective function. In order to infer how to tradeoff such a diverse set of decision criteria, inspired by human reasoning over complex decision problems, we take the approach of imitating the strategy of an expert questioner [Bullard et al, IJCAI 2019].
We evaluated this work using two types of constraints: (a) time and (b) cognitive availability of the human partner. Time denotes time frame for which the human will be co-located in the same space as the agent, but may be doing other things. Query budget is a proxy for how cognitively available the human is to answer questions, during the time the human partner is present. Given these, we tested four experimental conditions: (1) short time present, low query budget, (2) long time present, low query budget, (3) short time present, high query budget, and (4) long time present, high query budget. Condition 1 is the most constrained, intuitively representing the scenario where the teacher is not around for very long (e.g. 30 minutes before leaving the house) and is also busy during that time, so does not have much cognitive load for answering questions. Condition 4, by contrast is the least constrained, intuitively representing the scenario where the teacher is both around for an extended period of time (e.g. at home all weekend) and is very cognitively available during that time to answer whatever questions the agent has. The image below depicts a sample human-robot interaction from Condition 4. The ideal behavior here is for the learner to recognize the human partner will be around for a long time and is very available to answer questions (e.g. if the human has explicitly set aside time to teach) and leverage this time to ask as many questions as it has.
(on the left) — The robot makes good use of the teacher’s time by continuously engaging the teacher. It uses the objective function that jointly optimizes for the learning task and the learning environment which imposes time and query budget constraints (DT-Task-Env). Thus, it is able to explicitly reason about the time it has to ask questions and the query budget allocated for doing so and exploit this information.
(on the right) — The robot largely wastes the teacher’s time by not engaging the teacher when she has explicitly set aside time to be there and answer questions, and she in turn becomes bored over time. It is the baseline condition which uses an objective function where the agent only reasons about the learning task (DT-Task, like above and the approach commonly employed in Active Learning literature). This robot cannot adapt to the teacher’s constraints, so always employs the same query selection strategy, independent of the teacher’s time and cognitive availability.
My initial work in LfD focused on how to leverage structured interaction with a human partner to learn the perceptual groundings for all task-relevant concepts. The primary difference in this thread of research as compared to the active learning research (project II, above) is the agent's role here is that of a passive observer, instead of an active questioner. I led two experimental investigations for the project on leveraging interaction: the first towards acquiring representative training instances, the second towards eliciting informative features. Throughout this entire body of work, the goal for the agent was to perceptually ground all concepts (i.e. items, locations) relevant to a specific task; what changes is the way the agent exploits interaction with a human partner in order to achieve this goal.
The image to the right shows an example of the robot's workspace for the Curi robot, while observing task demonstrations being provided by a human teacher (me, as it were).
LfD for Grounding Task-Relevant Concepts
Actions in a given task plan or recipe are often parameterized by object and semantic location symbols (concepts) relevant for task execution, but not grounded in the physical environment where the robot is situated. In this first investigation, we employed the paradigm of Learning from Demonstration (LfD) to understand how many demonstrations of a task were necessary to perceptually ground all of the given task-relevant concepts in the agent's environment [Bullard et al, RO-MAN 2016]. Assuming the robot does not already have classifiers for identifying instantiations of the concepts in its environment, we sought to leverage LfD for efficiently acquiring the relevant task knowledge in any new environment where the robot may be placed.
For evaluation of this work, the agent had to ground concepts for two different tasks, each in three different experimental environments (shown below). The object and semantic location concepts to be grounded are derived from the parameters of the task recipes given to agent (like the ones shown below). Each environment represents a kitchen the agent could be placed in and thus contains the same abstract objects and locations, but instantiated differently (as one would expect in different homes). The agent's goal was to learn binary classifiers for each of the abstract concepts (labels), given demonstrations of each task from the teacher.
Serve Salad Task Recipe
pick-place <bowl, cupboard, counter>
pick-place <salad-dressing, fridge, counter>
Serve Pasta Task Recipe
pick-place <pasta-pot, stove, counter>
pick-place <bowl, cupboard, counter>
pick-place <sauce-bottle, fridge, counter>
Experimental findings showed that environment-specific groundings of all task-relevant objects and semantic locations could be learned efficiently by employing the paradigm of LfD. In all environments, learning began to stabilize after only about 5-6 task demonstrations.
The key implication of this work was that we could leverage interaction with a human partner to solve the task-situated symbol grounding problem; subsequent work relaxed some of the assumptions made in this first interactive learning project and built upon this key insight.
Human-Driven Feature Selection for Representing Task-Relevant Concepts
Feature Selection directly impacts how quickly the agent is able to learn a sufficient model for each abstract concept given in a task recipe. The challenge is that humans typically only provide a small number of examples to an agent, which may be insufficient for computational feature extraction techniques, and it is not feasible to hand code features for every task a priori. Thus, this subsequent investigation examined whether humans can also help the agent identify informative features for discriminating between task-relevant objects and how to elicit this feature information from a human teacher [Bullard et al, ICRA 2018]. It contributed five different approaches for human-driven feature selection.
The images below depict three of the approaches explored. In this work, we used an Unpack Groceries task, where the agent had to learn to ground the following concepts: (a) beverages, (b) produce, (c) food cans/jars, (d) snacks. The human instance selection (HIS) approach (shown on left) represents the typical LfD case where a teacher provides a small number of representative training examples of each concept, with the caveat that the teacher's explicit goal when selecting these examples is to help the learner to differentiate between the concepts. This provides a way for the teacher to indirectly communicate useful features, given that some features may be abstract or difficult to articulate. The human feature selection (HFS) and human feature reduction (HFR) approaches (depicted on right) allow the teacher to directly enumerate informative features to the learner. HFS highlights all features believed to be useful for differentiating between the task concepts, and HFR eliminates all features that should be ignored by the learner. As a note, the images show only a simplified version of these experimental conditions, for illustration purposes. The last two experimental approaches for human-driven feature selection were a combination of the former two, whereby the teacher first selects a small set of training examples and then explicitly enumerates features he/she was attempting to implicitly highlight through the instances selected prior.
Our experimental findings provided the key insight that the HFS approach was especially valuable for LfD domains (where training data is limited) as it statistically outperformed all computational feature selection approaches tested on the task, given the small set of examples provided by the teacher. However the caveat is that individual features must be semantically interpretable or intuitive to the teacher.
The key implication of this work was that the agent could also extract informative features from a human partner, which leads to more efficient learning of its concept groundings. Based upon findings from both investigations discussed in this section (Passive LfD), our Active Learning work sought to enable the agent with a reasoning framework for autonomously requesting the feature and instance information it needs, thereby mitigating cognitive burden on the human and collaborating in the learning.