SuSPECT – Scaffolding Student PErspectives for Critical Thinking

What’s the problem?

The issue of biased information is a societal issue which influences all of us. Examples include the recent European referendum vote in the UK, and the US presidential election.  There has been a surge in the number of resources online that are misleading or false. A recent white paper described the ability of learners to assess such information sources as “dismaying,” “bleak” and “[a] threat to democracy.”. Teaching facilitators have the ability and responsibility to educate their learners, not only in their areas of expertise, but also teach them how to think in a balanced way about the information they consume. While this has long been the role of educational institutions to nurture such skills, the means to do so need to evolve in pace with the ecosystem that our students are learning in.

While critical thinking is often implicitly integrated, or assumed in education, SuSPECT helps make critical thinking in education explicit, which then enables a better understanding of its importance in other domains like assessing online information.

What are we doing about it?

As a response, this project addresses how to help students critically evaluate and respond to online resources. SuSPECT is a short project funded by the Leiden-Delft-Erasmus Centre for Education and Learning, running from March 2017-March 2018. SuSPECT is a project aimed at helping learners develop more balanced thinking for materials they find online. This approach evaluates the efficacy of debate in the classroom, by building on the existing rbutr online argumentation system. This project aims to help learners not only assess the veracity of online resources, but also develop more nuanced and balanced thinking.

No really, this is what we are DOING about it.

  • Improving the Rbutr system to suggest timely rebuttals, and to use crowd sourcing to annotate potential rebuttals. The use case is a person who finds a website and wants to find rebuttals for it. This part of the system mines twitter and Reddit for occurrences of this specific URL and responses that contain other URLs. This user then annotated these URLs are rebuttals/contrary/irrelevant. These are then further used to rank potential results for other users who look for the same URL later. We are in the processes of hiring a developer to work on this task.
  • Working with lecturers to include debate in their curriculum. We have been discussing how to introduce the use of Rbutr and debate into two courses: IT & Values (Delft); Ethics, Culture, and Biotechnology (Leiden). We are also in deliberation with the Erasmus institute (Rotterdam) on how the intervention may be introduced to a MOOC called Deception Detox. Each course has a different curriculum and learning outcomes, which results in interesting differences in courses.

For example, IT & Values is expected to have a large number (~100) of students, which means that several tutorial groups will discuss the same topic in parallel and may support each other with materials via Rbutr.

Ethics, Culture, and Biotechnology is contrast more condensed, giving us the opportunity to experiment with a flipped classroom approach where students prepare their arguments ahead of time. Lecturers in each course prepare resources as a basis for debate (both for and against), but in the case of the flipped classroom this puts the bar higher. It also raises the bar for training the students in debate with validated resources before they do their own research (e.g., using Rbutr).

  • Doing experiments. Together with colleagues at EPFL we are investigating the impact of teaching students about critical thinking (short presentation on the Baloney Kit) on their opinions on typical debate topics online (such as vaccines causing autism). Students voted for topics using a mobile-based voting tool (SpeakUp), and we already see some promising results for this very short and simple intervention. Asking students to debate appears to help more students think critically, although this does not seem to be the case for students with strong opinions.  We are currently writing up the results, and look forward to sharing these with you. We also have loads of other ideas for experiments, so let us know if you want to collaborate (we have some small funds for running experiments).

Looking forward to share the progresses on all of these aspects with you soon!

Nava and the team


Special Issue on Human Interaction with Artificial Advice Givers

Many interactive systems in today’s world can be viewed as providing advice to their users. Commercial examples include recommender systems, satellite navigation systems, intelligent personal assistants on smartphones, and automated checkout systems in supermarkets. We will call these systems that support people in making choices and decisions artificial advice givers (AAGs) : They propose and evaluate options while involving their human users in the decision-making process. This special issue addresses the challenge of improving the interaction between artificial and human agents. It answers the question of how an agent of each type (human and artificial) can influence and understand the reasoning, working models, and conclusions of the other agent by means of novel forms of interaction. To address this challenge, the articles in the special issue are organized around three themes: (a) human factors to consider when designing interactions with AAGs (e.g., over- and under-reliance, overestimation of the system’s capabilities), (b) methods for supporting interaction with AAGs (e.g., natural language, visualization, and argumentation), and (c) considerations for evaluating AAGs (both criteria and methodology for applying them).

The full special issue can be found here:

ENSURE -ExplaiNing SeqUences in REcommendations

I have been awarded a Technology Fellowship at TU Delft, and will be joining the Web Information Systems group, Faculty of Electrical Engineering, Mathematics and Computer Science (with Geert-Jan Houben, Claudia Hauff, Alessandro Bozzon et al.), as an Assistant Professor from the 13th of February 2017!

This fellowship is focused on explaining sequences of recommended items (rather than single items or even sets).  This enables me to fund a Research Fellow to work on this challenge with me for 2 years. This job will be advertised more formally shortly, but to give you a teaser….

The research agenda involves: 

  • Gaining an understanding of people’s concerns regarding personalization for sequences of recommended items.
  • Gaining an understanding of people’s views on the kinds of explanations that alleviate their concerns and help them to make good decisions.
  • Producing guidelines for algorithms for constructing explainable recommender sequences.
  • Developing algorithms for explaining sequences containing both novelty and trade-offs effectively and while considering  privacy concerns. This includes investigating the role of context and personal characteristics.
  • Facilitating a dialogue between policy makers, researchers, and the general public regarding the findings above.

Job Requirements:

You hold a PhD in computer science or related disciplines. You have a track record of scientific excellence in the field(s) of recommender systems, user-modeling, multi-objective optimization, and/or human-computer interaction. You must demonstrate either an ability to design algorithms for sequences of items, or a deep experience designing interactions with recommender systems. You will be expected to lead or strongly contribute to academic publications, contribute to grant proposals, and to interact with stakeholders outside academia (e.g., end users, business, and public policy). Strong verbal and written communication skills are therefore also required.

If you (or someone you know) are interested in joining TU Delft and working with me on this challenge, I’d love to have an informal chat to see if we have a fit.  I can be reached at:

Toward Ethical Personalization

When we work with data analytics we often lose sight of the context in which our users and customers live their lives. As data scientists, we focus on collecting data, filtering it, and improving our predictive accuracy. After all, getting the prediction right is a challenge in and of itself. We do not want to make incorrect predictions, or miss out on correct predictions. However, there are much subtler ethical questions to consider when applying analytics to personal data.

At the end of 2016, I organized two events with an aim to address some of these questions. The first, was aimed at a more general audience. Together with Dr. Paolo Palmieri we organized an information session titled What is the Internet Hiding From You…?, as part of the ESRC festival of social science. At the end of the session participants contributed to focus groups where we discussed:

  • Which benefits they would like get from personalized services?
  • Which information they were willing to share; and when personalization happens?
  • How they want a computer to communicate to the the information it has used?

While a small and self-selecting sample, I was struck by how distrustful the participants were of personalization services, and the need for increased transparency and communication between personalization services and users.

The second event focused on industry, and key players in Big Data in Scotland.  In a panel on “Data Analytics: Balancing Insight, Privacy & Trust’‘ at the Big Data Conference, we discussed the following issues:

  • When do analytics become too intrusive? When can we make inferences across data sources, or inferences that users did not consent to being made when they initially provide the data? (Video)
  • How should we make algorithmic biases visible to users? How do we avoid filter bubbles like the one that happened during Brexit and the presidential vote in the US? How can explanations be used to improve transparency? (Video)
  • Is there going to be a swing in the balance of power towards individuals / consumers? How do we balance this with businesses’ need to be competitive? (Video)

The panel members represented key stakeholders in policy and industry:

  • Ken Macdonald, Head of ICO Regions, Scotland, NI & Wales, Information Commissioners Office
  • Martin Squires, Global Lead, Customer Intelligence and Data, Boots
  • Dr. Hannah Rudman, Director, Rudman Consulting Limiting

The discussions in the panel highlighted a corporate interest in personalizing in a way that is beneficial to users. From the conversation it appeared that many industry players are less aware of more complex and delicate ethical challenges such as data linking, using data for different purposes than it was initially supplied, or that an algorithmically correct personalization is not always the best from a user perspective (see e.g., the target story).

The panel also confirmed that these concerns are recognized on a policy level by the Information Commissioner’s Office (ICO) in the UK. The new EU General Data Protection Regulation (GDPR) coming into effect in 2018 recognises privacy as a legal right, and includes a “right to explanation‘’ whereby a user can ask for an explanation of an algorithmic decision that was made about them. Despite the planned UK exit from the EU, the ICO confirms that comparable regulations will be put into effect in the UK, and that the ICO will have legal capacity to enforce compliance with these regulations. Privacy policies will need to be geared towards the customer and expressed in clear and plain language.

It is largely a welcome development that analytics platforms have a great deal of power by having access to the usage data of individuals. It is now time to start using that power wisely. The public is justifiably concerned, and it is our responsibility to think critically about what data we need to collect and store, and for which purposes. Computers can make and collect data and run algorithms, but humans working with big data are the ones that establish the analytical programmes, professional practices, and codes surrounding them. Overlooking the person-centred ethical issues may result in negative social impact.

Let there be a balance between the innovation and economic opportunity of big data, and respecting privacy and human rights within open, tolerant societies. To allow this to happen, we need to work together to establish best practices, and make a record of positive case studies where these have been observed. This is a conversation that is going to need all hands on deck: customers, policy makers, data analytics companies, as well as academic researchers. Let’s get cracking!

What is the internet hiding from you?

Our ESRC Festival of Social Science event proposal has been accepted! We will be running focus groups and an information session on the topic: “What is the internet hiding from you?” on November 8th, 2016. Event held at the Executive Business Centre. Afternoon session 2.30-5pm, OR Evening session 6-8:30pm (two slots of the same sort of session).

Most of us know that our personal data is being used to filter our Facebook `timeline’ or that Amazon personalises which items it shows to us. However, as users, we have not always agreed to that personalisation, and do not know how our personal data is being used. It’s not surprising that many of us are unsure whether we can trust the internet and how our information is shared.  This workshop gives members of the public a chance to find out more about the issues and share their views, potentially shaping the future of big data research.

More details and registration here:

Big Data and Cloud Computing

This term I’ve been teaching Big Data and Cloud Computing as part of a Masters degree in Applied Data Analytics at Bournemouth University.

The students on this course worked on exploratory data analysis, using large, real world data-sets. To support interactive visualization we used R for analysis, and to create a web application. This was a great way for the class to learn hands on about issues with large datasets, including hetreogenity across data sources,  and the importance of being able to host and access the data (one of the groups reads JSON data from a live feed).

Each team covered a different interesting problem area including: climate change, crime rates in the Camden Borough of London, live earthquake updates, and live tweets of music listens (#nowplaying) across the globe. I am really proud of their work and thought you might want to have a look! Just click on the topic name, the images or the links to try out the systems.


This application is used for both visualization and exploratory analysis of earthquake data. The earthquake data are being downloaded online, in real time, from in geoJson format. Then the application manipulates them in order to take the final dataset that includes only the necessary variables. The names of the variables are Local.Time, magnitude, significance, place, longitude, langitude and depth.

Screen Shot 2016-05-16 at 16.48.10.png


The dataset ‘Crime’ is gathered from Camden Police crime report from January to June 2015. The data is geocoded with longitude and latitude information and mapped using the Leaflet map widget. Camden Police data can be found here

Main packages used are Shiny, Leaflet, shinydashboard, rpivotTable

Screen Shot 2016-05-16 at 16.49.02.png


Climate change:

Screen Shot 2016-05-16 at 16.50.14

Music listening patterns across the globe

A smaller data set to enable a quicker navigation through the app, even if the statistics become meaningless you will have a better overview with it.

The full data set (which has 356 845 observations); it takes around 35 seconds for the page to load.

Screen Shot 2016-05-16 at 16.44.05

While this is the first time these students have worked with R, I’m sure you’ll agree they’ve done a terrific job of visualizing complex real world data!


Workshop in Glasgow

Friday the 31st of May I went to a workshop in Glasgow to meet with other Scottish Researchers who work in the area of information retrieval. The idea behind this strand of research is to help someone get access to the information they need from a (large) collection of information.
In some ways that is what we are doing in SAsSY – we are helping a person get access to the key information for a particular person in a large plan, where lots of potential options are possible and where many different people (with different roles and preferences) are involved.

All of the talks were very interesting. Some of the topics covered were:
supporting business processes (including helping researchers like us apply for funding to keep doing our work!), detection of events in cities, filtering important messages from tweets, mobile phones that retrieve information in an appropriate context (e.g. when you are in a specific location), finding appropriate images for the people who actually with media, to surprising but helpful discovery of academic articles.

I was particularly struck by the work of Martin Halvey who found that people made most mistakes when they were asked to judge the relevance of information that was only partially relevant. They also made these incorrect judgements quicker than they made cut and dry decisions. This highlights an area where explanations could be particularly useful.

It was also clear to me that many of the ongoing projects would benefit from increased transparency and a level of aggregation which is adapted to the user experience and previous knowledge. For example Tiphaine Dalmas spoke about Spacebook, which is a mobile application which supports pedestrian navigation and exploration in urban environments. It uses Text-to-Speech to tell people about the city they are in as they walk around. Often lengthy descriptions can be difficult to listen to, and I see how adapting to a user (e.g. who already has seen a point of interest before) could help shortening or summarising the information for people so they get the most relevant information for them.

Likewise, Matt-Mouley Bouamrane spoke about the importance of supporting information management for surgical patients – especially from hospital (primary care) to their general practitioner (secondary care). What really struck me was that it is not unusual for a GP to phone a hospital after discharge since they have no record of the surgery the patient has just undergone. Computer generated summaries of time-series data and workflows could greatly improve patient care!

It was wonderful to meet with other like-minded researchers, who are as interested in the data processing side of things as they are in how to genuinely make this useful for people!

mturk related links

Google apps

google app engine mysql:

mysql cloud has temporary free license, you’ll need the commandline tool:

gsutil is needed to put files (like database dumps) on the cloud – info on how to use this here.

.boto file needs proxy set, but should work after first authentication session.

if eclipse doesn’t want to recognize config for sql cloud, download eclipse data tools platform


matt’s notes on mturk:

commandline tools/sandbox for env. variables: GetStarted.html