Chatbot

DTU students develop voice-controlled chatbot for art exhibition

Three students have assisted the National Gallery of Denmark in creating a digital version of the sculptor Anne Marie Carl Nielsen, who passed away in 1945. DTU Compute hopes the chatbot will demonstrate how generative AI can make learning more interesting.

Digital Curator at SMK Majken Overgaard and MSc student at DTU Compute Niels Raunkjær Holm
Digital Curator at SMK Majken Overgaard and MSc student at DTU Compute Niels Raunkjær Holm. Credit: Hanne Kokkegård.
All three students are studying abroad for the autumn semester. Before Niels Raunkjær Holm left Denmark, he had time to visit and see the chatbot at the National Gallery of Denmark. Watch the video (on YouTube) and read the fact box for the technical details.

The chatbot’s dataset

Since the underlying language model for the chatbot is GPT-4o, the students could not train the model specifically with biographical material such as diary entries, letters, articles, book excerpts, or other material about Anne Marie Carl Nielsen.

The source material about Anne Marie Carl Nielsen has been digitised and translated into modern Danish, corresponding to the language models’ training data.

The texts have been divided into short segments, each covering only one topic – a kind of brief summary of a specific subject.

When you input text into an AI model like ChatGPT, the text is converted into vectors. Vectors are rows of numbers that capture the essence of the text, such as its meaning and context. The AI model then uses these vectors to perform tasks like classification, translation, or generating responses. So when all the source material for this SMK chatbot has been found, divided, and processed by the language model, it has been turned into vectors stored in a vector database.

When a question (prompt) is asked, it is also converted into a vector, and then it is compared with the vectors in the vector database. The language model then generates a response based on up to six passages in the vector space that are closest to the question.

The students have adapted the language model to the chatbot through an overarching prompt – a kind of basic instruction. It includes biographical information with basic facts about birth year, birth place, and family. But also that the chatbot should play the role of a deceased artist and respond in a certain way. This way, there is control over what the software/program is used for.

If the audience asks something outside the context, the chatbot says it cannot provide an answer.

Photos from the exhibition and DTU Compute. Credit: Hanne Kokkegård

The art of deep fake

This is the first time SMK has created a chatbot to interact in entirely new ways with materials, etc.

The students are delighted to see how their learning can be practically applied in a more untraditional way.

“It has been super exciting. It also related to our bachelor’s project, where we worked really well together. That the technology is now being used in an art exhibition is really fun to be part of,” says Niels Raunkjær Holm, who was invited to the opening last Saturday to talk about his, Michael’s, and Andreas’s work with the chatbot.

Professor Morten Mørup highlights that the chatbot gives a taste of the near future.

“It has great potential in terms of creating entirely new ways to bring exhibitions to life and communicate in new ways in new formats. The fact that you can talk to a system and get an immediate response is something we will generally see much more of, also in smartphones and everyday life. Today we write with a chatbot, but soon, we will talk to it, which will give a more natural experience of communication,” says Morten Mørup.

At the museum, the chatbot has its own room where visitors can read about the ideas behind the communication – and try out the chatbot themselves, which is placed in a kind of lectern.

The visual backdrop is a screen where the asked questions are transcribed, and then the chatbot responds. Immediately after the response, the screen shows a kind of spherical version of Anne Marie Carl Nielsen in black and white dots (like pixels), which fade in and out in time with soft music. According to Majken Overgaard, the museum wants to clearly point out that it is not reality – but a digital artistic approach.

This emphasis is important in a time when AI and fake news, for example, are used politically in election campaigns and open the possibility of influencing opinions, explains Morten Mørup.

“You need to be aware of what is real and what is fake in a time when generative technologies can imitate both living and dead people. And where the technologies are moving quickly and getting better. We hope that our students’ contribution to the exhibition can help show how generative AI and deep fake technologies can be used in a way that can make learning more interesting.”

As bachelor students, Niels studied Software Technology, while Michael and Andreas studied Artificial Intelligence and Data, graduating in June 2023. Now, Niels is studying for the master's in Mathematical Modelling and Computation, and Michael and Andreas are studying for the master's in Human-Centered Artificial Intelligence.

The chatbot was developed as a special course, for which they received 10 ECTS credits.

AI Programmes used in the chatbot

 

Speech-to-text:

  • Two different models depending on the language. If the spoken question is in Danish: chcaa/xls-r-300m-danish-nst-cv9. English: OpenAI Whisper Base.

Language models: 

  • For offline generation (i.e., done in advance for use in the database) of synthetic summaries and relevant questions for each passage (these are generated to increase the accuracy of retrieval) in our collection of documents about Anne Marie Carl-Nielsen, we used GPT-4-Turbo.
  • For our relevancy evaluator, which assesses whether questions are relevant to the conversation (i.e., for a conversation with AMCN), we used GPT-4o.
  • For our text generation to answer questions, we used GPT-4o.

Retrieval Augmented Generation: 

  • Passage retrieval: We automatically divided each document in our document corpus into small passages of about 3-5 sentences using an LLM. Then we generated a simplified summary for each passage, as well as 5 relevant questions that the passage answers using GPT-4-Turbo.
  • We then generated embeddings of each summary and each individual question using OpenAI Ada-002 (these are generated to hopefully optimise retrieval by increasing the similarity between the user’s question and relevant passages).
  • When we run retrieval online (online here means it happens when a user interacts with the system), we find the 3 most relevant passages based on the L2 norm between the embeddings of the synthetic questions and the posed question (thus a total of 6 passages).
  • Relevancy Evaluation: We use GPT-4o to assess whether a question is relevant. A system prompt is provided that describes the role, the 6 relevant passages, the previous 2-6 conversation messages, the posed question, and a message asking if the question is relevant. If the question is not relevant, we provide a predefined message to the user.
  • Prompts: Our prompts are primarily based on a qualitative analysis of the output. If we encounter scenarios where the output is unsatisfactory, we use ChatGPT-4 to optimize our prompt by describing the desired output and our current prompt. Additionally, 5 prompts for our relevance evaluator were tested on 20 relevant and 10 non-relevant questions, after which we used the prompt with the highest accuracy compared to the desired output.
  • Text-to-speech technology: Google “da-DK-Wavenet-E” for Danish outputs and Google “en-US-Journey-F” for English outputs. Both are through Google’s API.
  • Voice conversion technology: Here we use Free VC 24kHz. The underlying data is voice actor Lotte Andersen, who has recorded 24 minutes of Danish speech for Danish VC and 33 minutes of English speech for English VC.