The Wizard of Oz (WOz) experiment

Performing the Wizards of Oz – Written by Martin Porcheron

The Wizard of Oz experiment (WOz) is a research approach in which an intelligent system is presented to users, typically as part of a research study. Unbeknownst to the user, the presented intelligence is a mirage, with the gubbins of the supposedly intelligent system run by a human operator pulling metaphorical levers. In other words, the intelligence is a fiction. In an article presented at ACM CSCW 2020, and due to be published in Proceedings of the ACM on Human-Computer Interaction, we take a look at our use of the method and unpack the interactional work that goes into pulling of the method. In other words, we pull back the curtain on the method. This blog post is a bit of a teaser, focusing solely on some of the elements of collaboration that we identified in the article.

Alternatively, instead of (or in addition to) reading this blog post, you can also watch the presentation on YouTube (it was a virtual conference for 2020 for obvious reasons). This presentation includes a short video clip from the data we collected if you want to get a feel for how the study unfolded.

As you can probably guess, the method’s name comes from the L Frank Baum novel The Wonderful Wizard of Oz. Early use of the method in HCI took less exciting names like ‘experimenter in the loop’1. A WOz approach offers the ability to prototype and potentially validate—or not—design concepts through experimentation without the costly development time that a full system may require2. Approaches have included simulating things such as a ‘Listening Typewriter’3 and public service information lookup for a telephone line4. In WOz, different elements may be simulated, ranging from database lookup through to mobile geolocation tracking5. Due to the recent commericalisation of voice recognition technologies, there is a plethora of literature using the approach for studies in voice interface design, with natural language processing being the simulated component. I’d guess that’s because building natural language interfaces is a costly endeavor (monetarily and timewise).

In our paper, we look at the use of a voice-controlled mobile robot for cleaning, where we simulated the natural language processing of the voice instruction, and conversion of this into an instruction to a robot (i.e. the Wizard listened to requests and controlled the robot). We were running RoboClean as part of a language ellicitation study, although that’s really the focus of the paper. Cruically our study required two researchers to operate the proceedings: one scaffolded the participant interaction and the other performed the work of the ‘Wizard’, responding to participants’ requests and controlling the vacuum.

Collaboration was key

In the paper we go into much more detail, focusing on the various aspects needed to pull off such a study, starting with the how the ‘fiction’ of the voice-controlled robot is established and presented to users, through to how the researchers running the study attend to a technical breakdown while running the study. We progressively establish the fiction as an interactional accomplishment between all three interactants (i.e. the two researchers and the participant).

The researcher, who in our study stands with the participant, introduces the scenario, shows the robot to the participant, and guides them into instructing it (i.e. they scaffold the participant’s involvement in the study). The participant ostensibly talks to and responds to the vacuum. The Wizard—who is listening—responds to the request, in accordance with the fiction presented by the researcher and the notions of what a voice-controlled vacuum robot might reasonably respond to. It’s the Wizard whom the participant is really instructing in such a study (as the voice-controlled robot is but a fiction). The researcher standing with the participant then must performatively account for the actions taken by the Wizard according to that fiction. In other words, whatever ‘the robot does’, the researcher must attribute its actions to the robot to conceal the machinations of the Wizard.

There are other challenges, of course, that make this harder: the Wizard must respond to the participants’ requests in a way consistent with the fiction quickly and consistently in order to ensure the methodological validity of the study. We also discuss a situation in the article where there is a technical glitch with the robots, requiring both researchers to work together in an improvised manner to uphold the secrecy of the Wizard, while trying to collaboratively resolve the issues face.

Given the dramatic naming of the approach, we describe this accomplishment as a triad of fiction, taking place on the ‘front stage’ (with the Wizard working ‘backstage’). Around the same time, others also referred to this as ‘front channel’ and ‘back channel’ communication6. See the figure for how we pictorially represent the communication between the various interactants in our study.

Practical takeaways

Above I’ve focused on the collaboration required to pull of the study, we also devote a fair chunk of the article to detailing the practical steps we took in implementing the study design and running the study. With this, we discuss how we used various technologies, piecing them together to present a believable ‘voice-controlled robot’. We had a shared protocol document that both the researcher and the Wizard used to maintain awareness of each other’s actions and an outline script that detailed the sorts of requests that the robot would respond positively (or not) to, and this was progressively updated throughout the studies. While we frame running a WOz study as a performance, we were keen to stress the methodological obligations involved too: the performance must be undertaken according to methodologically valid research practice. We argue this requires meticulous care and attention, and that this is driven by the collaboration of the researchers throughout.




My internship on the RoboClean project – Jane Slinger

My internship with the RoboClean team involved developing a custom Alexa skill to control Neato vacuum cleaners by voice. This will enable further development to link with the voice interface if required, as the other aspects of the project involve web systems and multi-agent systems. I also helped run a study to find out how users would interact with the potential system in a lab environment.

I enjoyed the work as it was in an area that interested me and had some challenges in the code to overcome, leading me to learn more about how the systems worked to explore different solutions. It was nice to be able to build on skills about Alexa development learnt in my 3rd year project and include linking to the neato API through HTTP requests and a 3rd party library. This included setting up the Account Linking on the Alexa skill and then adapting some of the code from libraries to work with node.js on the backend instead of front-end JS-based methods that were already in place.

Designing the interactions with the robot and the user was also very interesting as I wanted to make sure that the system would prompt for the necessary information about the robot, and location to clean, without becoming annoying for the user.

The internship will help with my studies and future work as it has given me experience of working with a research team, building on areas I had some experience in as well as expanding to other technical skills that I hadn’t used before, and will be useful in the future.

Written by Jane Slinger

I-CUBE call for Participants

We are looking for participants for the I-CUBE project’s first study, taking place at the School of Computer Science, this November on Jubilee Campus.

This initial call is for employees of the University and members of the public, more generally. We will make a separate call for student participants. All participants need to be 18 years old or over.

If you are interested in taking part please use this Doodle link: to select your appointment and participate in our study.

The study’s task is to instruct a trainee ‘robot’ to sort a pile of clothes into separate washing loads according to a detailed list of tasks. This is to examine human interactions in a prescribed situation. There is a short questionnaire-interview to complete after the task.

You will be both video and audio recorded while instructing and responding to the trainee ‘robot’ as well as audio-recorded for the interview.

The experiment is expected to take approximately 45 minutes of your time and you will be reimbursed with £10 worth of shopping vouchers.

Charlotte Gray shares her experiences of working on RoboClean

I was introduced to the RoboClean project at Horizon whilst interning with the Advanced Data Analysis Centre. The project investigates the ways in which end-users interact with a robot vacuum cleaner and how a robot responds to user utterances; the aim being to inform its effective design and use within food factories.

I was invited to continue my internship for 5 more weeks within Horizon to help with the analysis of data collected through an elicitation study. Overall, this has been a really valuable and rewarding experience. Coming from an academic background in Sociology, I found working closely with researchers specialising in Computer Science exposed me to different research aims and challenges than I had previously encountered. This has been insightful for me as it has not only helped develop new skills in research analysis and interview techniques, but also applied the principles of a range of research methods gained during my academic studies over the past 2-years to cutting edge technological developments.

I have been responsible for transcribing participants’ audio data, analysing visual data, and creating a summary written report of participants’ interview responses. The focus of the report was on the benefits, limitations, and disadvantages experienced by users from the user-robot interactions. The attendance at a range of team meetings has also been beneficial in understanding interactions within a work environment, especially where individuals are working together from across a range of disciplines. Combined with the skills I have learned at workload prioritisation and management, this has made me confident to face future work situations and dilemmas. Additionally, I have written literature reviews on the topic of human-robot interaction. Being able to explore these new topics has also helped me see how issues explored in Sociology are becoming increasingly influenced by the world of technology, for example, how individuals’ day-to-day lives are mediated by the introduction of robots to the workplace. The multidisciplinary projects throughout Horizon have therefore also been interesting to work alongside, clearly showing the benefit of collaborative projects in producing innovative findings.

Contributing to a research project which is aiming for publication in a research journal has been hugely rewarding and exciting, and has made the idea of working in a similar environment after graduating a lot more persuasive.

Written by Charlotte Gray

AI Technologies for Allergen Detection and Smart Cleaning Food Production Workshop

In collaboration with the AI3 Science Discovery (AI3SD) and Internet of Food Things (IoFT) EPSRC Networks the RoboClean team ran a workshop in London on the 17th of October. The focus of the workshop was to discuss how digital technologies such as AI, sensors and robotics can be used for enhanced allergen detection and factory cleaning within food production environments. The workshop was well attended by a range of stakeholders from industry, academia and organisations such as the Food Standards Agency. The morning of the workshop had three speakers. Nik Watson from the University of Nottingham gave a talk on the future of factory cleaning. This talk covered a range of research projects from the University which developed new digital technologies to monitor and improve factory cleaning processes. The second talk was from AI3SD lead Jeremy Frey from the University of Southampton. Jeremy’s talk gave an introduction to AI and covered a range of new sensors which could be used to detect the presence of allergens in a variety of food products and environments. The final talk was delivered by Martin Peacock from Zimmer and Peacock, a company who develop and manufacture electrochemical sensors. Martin gave an introduction to the company and the technologies they develop before demonstrating how there sensor could be connected to an iPhone and determine the hotness of chilli sauce. Martin’s talk finished by discussing how electro chemical sensors could be used to detect allergens within a factory environment. The afternoon of the workshop focused on group discussions on the following the four topics – all related to allergen detection and cleaning within food production:

  • Data collection, analysis and use
  • Ethical issues
  • Cleaning robots
  • Sensors

Each group had a lead, however delegates moved between tables so they could contribute to more than one discussion. At the end of the workshop the lead from each group reported back with the main discussion points covered by the delegates. The delegates on the ‘robotics’ table reported that robots would play a large role in the future of factory cleaning as they would free up factory operators to spend time on more complicated tasks. The group felt that the design of the robots was essential and discussed that new factories should also be designed differently to facilitate robot cleaning more easily. The group also thought that effective communication with the robot was a key issue which needed further research. The ‘sensors’ group reported that any new sensors used to detect allergens or levels of cleanliness would need to fit into existing regulations and practices, but would be welcomed by the industry, especially if they could detect allergens or bacteria in real-time. The ‘data’ group reported that there was a need for data standards relevant to industrial challenge and there was also a need for open access data to enable the development of suitable analysis and visualisation methods. The ‘ethics’ group discussed numerous key topics including, Bias, Uncertainty, transparency, augmented intelligence and the objectivity of AI.