Chatbot

DTU students develop voice-controlled chatbot for art exhibition

Three students have assisted the National Gallery of Denmark in creating a digital version of the sculptor Anne Marie Carl Nielsen, who passed away in 1945. DTU Compute hopes the chatbot will demonstrate how generative AI can make learning more interesting.

Digital Curator at SMK Majken Overgaard and MSc student at DTU Compute Niels Raunkjær Holm. Credit: Hanne Kokkegård.

Tuesday 03 September 2024

Hanne Kokkegård

Until 8 December, the National Gallery of Denmark (SMK) in Copenhagen is showcasing how artificial intelligence can help reintroduce forgotten female artists into art history. In the exhibition “Against All Odds - Historical Women and New Algorithms,” visitors can interact with a digital version of the renowned sculptor Anne Marie Carl Nielsen (1863-1945) through a chatbot developed by three students from DTU Compute.

Unlike the basic version of ChatGPT, which generates responses based on vast amounts of data, the chatbot at SMK responds using a selected dataset consisting of historical sources such as letters, diaries, and recent research on Anne Marie Carl Nielsen’s life and work. The chatbot searches for answers in a closed (vector) database and does not have internet access.

Normally you type questions to chatbots, but here visitors can speak directly to the chatbot in Danish and English, and the chatbot also responds verbally. To give the chatbot more ‘personality,’ it responds in a voice tone that could resemble the sculptor’s. This is the first time SMK is using such a type of chatbot as a tool to convey and provide an overview of extensive archival material and find new ways into it.

Students from DTU Compute have previously collaborated with SMK on the use of artificial intelligence, and when Professor Morten Mørup received a new inquiry a year ago, he and his colleague, Associate Professor Tue Herlau, immediately thought of the three students Niels Raunkjær Holm, Michael Alexander Harborg, and Andreas Holmer Bigom.

They had just completed a bachelor’s project where they created a prototype to give a voice to deaf people who use sign language for communication, so they can talk to people who do not know sign language. That model uses visual computing to read sign language (visual signatures), translates it into text, and lets an artificial voice read the text aloud, thus giving the deaf a voice. So it was somewhat similar to SMK’s project, where the deceased sculptor Anne Marie Carl Nielsen was to be given a voice.

“In their bachelor project, Niels, Michael, and Andreas excelled in working with advanced AI technologies, and they were already familiar with some of the technologies that needed to be put together in a system to make this possible,” explains Morten Mørup.

The three now MSc students worked on the chatbot throughout the autumn of 2023. In December, the supervisors and Digital Curator at SMK, Majken Overgaard, were presented with a proof of concept of the students’ chatbot. They then collaborated with the company Yoke, which created all the visuals for the art exhibition.

“We have continuously built on the chatbot, found slightly better AI models for some parts of the system, and optimised it, so it could run stably and operate on the hardware available at SMK,” says Niels Raunkjær Holm.

The chatbot in brief

The students’ solution consists, in short, of a chatbot that, based on material from and about Anne Marie Carl Nielsen, can talk with visitors. The chatbot consists of a number of different AI components that are put together.

Overall, their solution is based on a speech-to-text AI system combined with the language model GPT-4o and expanded with Retrieval Augmented Generation (RAG) based on material from and about Anne Marie Carl Nielsen to produce text responses.

The responses are shaped using a text-to-speech AI system and an AI voice conversion system to give a response back to visitors in a voice tone that could match Anne Marie Carl Nielsen. The latter technology is the same used in deep fake videos to make it sound like a person is saying something they never actually said.

The voice in the chatbot is based on the actress Lotte Andersen, who has recorded long texts in Danish and English to provide the AI model with sufficient voice material to learn the signature of her voice and use it for voice conversion.

All three students are studying abroad for the autumn semester. Before Niels Raunkjær Holm left Denmark, he had time to visit and see the chatbot at the National Gallery of Denmark. Watch the video (on YouTube) and read the fact box for the technical details.

The chatbot’s dataset

Since the underlying language model for the chatbot is GPT-4o, the students could not train the model specifically with biographical material such as diary entries, letters, articles, book excerpts, or other material about Anne Marie Carl Nielsen.

The source material about Anne Marie Carl Nielsen has been digitised and translated into modern Danish, corresponding to the language models’ training data.

The texts have been divided into short segments, each covering only one topic – a kind of brief summary of a specific subject.

When you input text into an AI model like ChatGPT, the text is converted into vectors. Vectors are rows of numbers that capture the essence of the text, such as its meaning and context. The AI model then uses these vectors to perform tasks like classification, translation, or generating responses. So when all the source material for this SMK chatbot has been found, divided, and processed by the language model, it has been turned into vectors stored in a vector database.

When a question (prompt) is asked, it is also converted into a vector, and then it is compared with the vectors in the vector database. The language model then generates a response based on up to six passages in the vector space that are closest to the question.

The students have adapted the language model to the chatbot through an overarching prompt – a kind of basic instruction. It includes biographical information with basic facts about birth year, birth place, and family. But also that the chatbot should play the role of a deceased artist and respond in a certain way. This way, there is control over what the software/program is used for.

If the audience asks something outside the context, the chatbot says it cannot provide an answer.

Photos from the exhibition and DTU Compute. Credit: Hanne Kokkegård

Against All Odds – Historical Women and New Algorithms

Professor Morten Mørup (right) and MSc student at DTU Compute Niels Raunkjær Holm. Credit: Hanne Kokkegård

The art of deep fake

This is the first time SMK has created a chatbot to interact in entirely new ways with materials, etc.

The students are delighted to see how their learning can be practically applied in a more untraditional way.

“It has been super exciting. It also related to our bachelor’s project, where we worked really well together. That the technology is now being used in an art exhibition is really fun to be part of,” says Niels Raunkjær Holm, who was invited to the opening last Saturday to talk about his, Michael’s, and Andreas’s work with the chatbot.

Professor Morten Mørup highlights that the chatbot gives a taste of the near future.

“It has great potential in terms of creating entirely new ways to bring exhibitions to life and communicate in new ways in new formats. The fact that you can talk to a system and get an immediate response is something we will generally see much more of, also in smartphones and everyday life. Today we write with a chatbot, but soon, we will talk to it, which will give a more natural experience of communication,” says Morten Mørup.

At the museum, the chatbot has its own room where visitors can read about the ideas behind the communication – and try out the chatbot themselves, which is placed in a kind of lectern.

The visual backdrop is a screen where the asked questions are transcribed, and then the chatbot responds. Immediately after the response, the screen shows a kind of spherical version of Anne Marie Carl Nielsen in black and white dots (like pixels), which fade in and out in time with soft music. According to Majken Overgaard, the museum wants to clearly point out that it is not reality – but a digital artistic approach.

This emphasis is important in a time when AI and fake news, for example, are used politically in election campaigns and open the possibility of influencing opinions, explains Morten Mørup.

“You need to be aware of what is real and what is fake in a time when generative technologies can imitate both living and dead people. And where the technologies are moving quickly and getting better. We hope that our students’ contribution to the exhibition can help show how generative AI and deep fake technologies can be used in a way that can make learning more interesting.”

As bachelor students, Niels studied Software Technology, while Michael and Andreas studied Artificial Intelligence and Data, graduating in June 2023. Now, Niels is studying for the master's in Mathematical Modelling and Computation, and Michael and Andreas are studying for the master's in Human-Centered Artificial Intelligence.

The chatbot was developed as a special course, for which they received 10 ECTS credits.

AI Programmes used in the chatbot

Niels Raunkjær Holm, Michael Alexander Harborg og Andreas Holmer Bigom have used the following AI programmes

Speech-to-text:

Two different models depending on the language. If the spoken question is in Danish: chcaa/xls-r-300m-danish-nst-cv9. English: OpenAI Whisper Base.

Language models:

For offline generation (i.e., done in advance for use in the database) of synthetic summaries and relevant questions for each passage (these are generated to increase the accuracy of retrieval) in our collection of documents about Anne Marie Carl-Nielsen, we used GPT-4-Turbo.
For our relevancy evaluator, which assesses whether questions are relevant to the conversation (i.e., for a conversation with AMCN), we used GPT-4o.
For our text generation to answer questions, we used GPT-4o.

Retrieval Augmented Generation:

Passage retrieval: We automatically divided each document in our document corpus into small passages of about 3-5 sentences using an LLM. Then we generated a simplified summary for each passage, as well as 5 relevant questions that the passage answers using GPT-4-Turbo.
We then generated embeddings of each summary and each individual question using OpenAI Ada-002 (these are generated to hopefully optimise retrieval by increasing the similarity between the user’s question and relevant passages).
When we run retrieval online (online here means it happens when a user interacts with the system), we find the 3 most relevant passages based on the L2 norm between the embeddings of the synthetic questions and the posed question (thus a total of 6 passages).
Relevancy Evaluation: We use GPT-4o to assess whether a question is relevant. A system prompt is provided that describes the role, the 6 relevant passages, the previous 2-6 conversation messages, the posed question, and a message asking if the question is relevant. If the question is not relevant, we provide a predefined message to the user.
Prompts: Our prompts are primarily based on a qualitative analysis of the output. If we encounter scenarios where the output is unsatisfactory, we use ChatGPT-4 to optimize our prompt by describing the desired output and our current prompt. Additionally, 5 prompts for our relevance evaluator were tested on 20 relevant and 10 non-relevant questions, after which we used the prompt with the highest accuracy compared to the desired output.
Text-to-speech technology: Google “da-DK-Wavenet-E” for Danish outputs and Google “en-US-Journey-F” for English outputs. Both are through Google’s API.
Voice conversion technology: Here we use Free VC 24kHz. The underlying data is voice actor Lotte Andersen, who has recorded 24 minutes of Danish speech for Danish VC and 33 minutes of English speech for English VC.

Contact

Morten Mørup Professor mmor@dtu.dk

Tue Herlau Associate Professor tuhe@dtu.dk