Rencontre / Débat

Automated L2 Speaking Assessment (AL2SA) International Workshop 2026

Conférence, Rencontre / Débat Innovation, Innovation pédagogique, Recherche, Séminaire transversal Du 5 mars 2026 au 6 mars 2026

9:00 - 18:00

Saint-Martin-d'Hères - Domaine universitaire

Salle Jacques Cartier at the Maison des Langues
1141 Avenue Centrale, 38400 Saint-Martin-d'Hères, France

Recent advances in artificial intelligence, particularly multimodal language models, are opening unprecedented opportunities for automated assessment of speaking skills in second language learning. Yet fundamental questions remain: What exactly should we assess in L2 oral production? How can technology best serve valid, reliable, and ethical assessment practices?

The AL2SA workshop (pron. /ˈæl.tsə/) brings together international experts to address these critical issues. Over two days, we will explore the theoretical foundations, methodological challenges, and practical applications of automated speaking assessment, balancing technological innovation with pedagogical and psychometric rigor.

Key Information

Dates: March 5-6, 2026
Format: Hybrid (on-site + Zoom)
Venue: Salle Jacques Cartier, Maison des Langues et des Cultures, Université Grenoble Alpes
Registration: Free and open to everyone – registration needed
Lunch registration deadline: February 20, 2026

>> Register here! <<

Organized by: UGA's Language Skills Assessment Unit (Cellule d'évaluation des compétences en langues, Service des Langues) in partnership with LIDILEM and LIG laboratories, UGA's Language Center and the Multidisciplinary Institue in Artificial Intelligence (MIAI).

Practical Information

📍 Venue & Access

Salle Jacques Cartier
Maison des Langues et des Cultures
Université Grenoble Alpes
1141 Avenue Centrale, 38400 Saint-Martin-d'Hères, France

From Grenoble train station: Take Tram B (25 min) to Bibliothèques Universitaires (Direction: Gières Plaine des Sports)
From Gières Gare – Université station: Take Tram B (5 min) to Bibliothèques Universitaires (Direction: Oxford)

🌍 Remote Participation

Join via Zoom: https://univ-grenoble-alpes-fr.zoom.us/j/94720156701?pwd=Zyhi1WuR3UxAuaa2be5hPyq8bowJmT.1

Program

THURSDAY, MARCH 5TH

9:00-9:15 OPENING (Alice Henderson, Univ. Grenoble Alpes, France)

Session n°1: Assessing speaking skills

9:20-9:50 Linda Terrier & Lionel Fontan
Univ. Toulouse Jean-Jaurès, France
Archean Labs, France
linda.terrieruniv-tlse2.fr (Slides upon request (work in progress))

What Are We Actually Measuring? Reflections on the Construct of Intelligibility in Automated L2 Speech Assessment

The measurement of intelligibility has become a cornerstone of L2 speech assessment, whether human or automated. However, this construct, which seems obvious and stable at first glance from Munro and Derwing's 1995 seminal definition (“the extent to which a speaker's message is understood by a listener”), proves to be much more complex when we look at the concrete methods used to evaluate it. Like listening comprehension, intelligibility can only be measured indirectly, which systematically raises two fundamental questions: what exactly is being measured through the proposed elicitation task, and through the chosen evaluation method?

This impossibility of direct access to a listener’s understanding—and thus to the speaker’s intelligibility—renders any measurement of intelligibility fundamentally problematic and complex. Our recent scoping review (Terrier et al., under review) has further revealed a wide diversity of elicitation tasks and methods used to assess intelligibility. Sound identification, orthographic transcription, keyword spotting, subjective ratings, comprehension questions… each modality engages the listener at different levels of processing, resulting in the measurement of partially distinct constructs.

In the first part of this presentation, Linda Terrier will situate the construct of intelligibility within the broader framework of listening comprehension, by analyzing several assessment modalities from the perspective of the Kintsch & van Dijk model of comprehension (1998), which distinguishes between low- and high-levels of comprehension through the construction of the microstructure, macrostructure, and situation model of the message at hand.

To concretely illustrate these issues, Lionel Fontan will then present an example of a recently developed task: oral translation of short sentences. This type of task provides a semantic reference for the message the learner wishes to convey, while allowing flexibility in the linguistic form. Lionel Fontan has used this task to investigate the external validity of subjective intelligibility ratings, and to analyze the bias introduced by the absence of a reference for listeners.

Ultimately, because of the inherent complexity of intelligibility in L2 speech, we argue that any approach to its assessment—especially automated assessment—must begin by explicitly stating what aspect of intelligibility is being measured and through which task.

9:50-10:20 Nivja de Jong
Leiden Univ., the Netherlands
View presentation slides

What is speaking proficiency and how to develop high-quality, practical, and ethical automated assessments for its measurement?

In current classrooms, among the second language (L2) skills, practicing and assessing speaking are often neglected. Its loud and transient nature makes it hard for teachers to provide individualized feedback, and assessing speech recordings is highly time-consuming. Automated speaking assessment can help address these issues. In this presentation (based on De Jong et al., 2025), I first define speaking as a skill and outline the requirements for high-quality, practical, and ethical tools for automated scoring and feedback. Then, drawing on the AI-based assessment framework (Fang et al., 2023) and an educational design perspective, I propose recommendations on how computational linguists, educators, and assessment practitioners can join forces to develop automated systems that are technically sound, ethically responsible, and likely to be adopted in educational practice.

References

De Jong, N. H., Raaijmakers, S., & Tigelaar, D. (2025). Developing high-quality, practical, and ethical automated L2 speaking assessments. System, 134, 103796. https://doi.org/10.1016/j.system.2025.103796

Fang, Y., Roscoe, R. D., & McNamara, D. S. (2023). Artificial intelligence-based assessment in education. In B. Du Boulay, A. Mitrovic, & K. Yacef (Eds.), Handbook of Artificial Intelligence in Education (pp. 485–504). Edward Elgar Publishing. https://doi.org/10.4337/9781800375413.00033

10:20-11:00 Discussion

11:00-11:30 ☕ BREAK ☕

Session n°2: Listening Disfluency

11:30-12:10 Nobuaki Minematsu
Univ. of Tokyo, Japan
View presentation slides

Measuring, Analyzing, and Predicting Listening Disfluency of Learners and Raters: Using Speech and AI Technologies for Automated Assessment

Every learner aims to become easy to understand in L2 speech communication, yet their unique pronunciation may sometimes hinder this goal. Listening is a mental process that is difficult to observe directly, which is one reason learners often feel anxious about how smoothly they are understood. How can we measure listening disfluency? Do we need expensive brain-sensing techniques to quantify it?

In this talk, we present a pedagogically valid and practical method for measuring listening disfluency. Shadowing is an immediate reproduction of presented speech with a short delay, in which listeners repeat what they hear in their own accent. When listeners experience perceptual difficulty, their shadowed reproduction breaks down, revealing points at which cognitive processing load increases. By analyzing these disruptions, we can capture dynamic properties of listening disfluency.

We then demonstrate two applications of this measured disfluency. The first is visualizing global communicability, which represents how easily individual learners from around the world understand others and how easily they are understood in return. The second is the development of a virtual shadowing rater, built by collecting a human rater’s shadowing data for L2 English and using it to model intelligibility-based L2 speech assessment.

Keywords:

listening disfluency, shadowing-based assessment, L2 intelligibility, global communicability

12:10-12:30 Noriko Nakanishi (online)
Kobe Gakuin Univ., Japan
Slides for participants only

The Shadowing Exchange Community: Enhancing Accent Perception, Intelligible Speech, and Empathetic Feedback

While AI-based tools provide useful automated assessments of L2 fluency, they cannot fully replicate the socio-emotional dynamics of actual human communication. This presentation introduces the Shadowing Exchange Community, a peer-to-peer program that utilizes AI technology not as a final goal, but as a scaffolding device to enhance human-to-human interaction.

In this program, participants record 30-second speeches in both English and Japanese. During this process, they are presented with immediate Automatic Speech Recognition (ASR) results, allowing them to verify intelligibility—at least to the system—and re-record as needed before engaging with other learners. This ensures that AI serves to prepare participants for the community's core activity: a reciprocal exchange where learners from diverse backgrounds shadow each other and provide mutual feedback.

Crucially, the program is designed to enhance cross-cultural sensitivity and communication skills, with a specific focus on how to provide supportive feedback. This approach is structured around three key goals:

1. Practicing listening to various accents.
2. Checking one's own intelligibility with speakers of other L1s.
3. Learning to give constructive, respectful, and supportive feedback.

This cyclical, community-based model offers valuable insights into the social, emotional, and intercultural dimensions of language learning. As of November 2025, the program has engaged approximately 180 participants, including English L1 speakers (~70), Japanese L1 speakers (~60), and others (~50).

This presentation will share the program's structure and discuss preliminary findings on its impact on learners' metalinguistic awareness and affective filter.

Keywords: Shadowing, Peer Feedback, Cross-Cultural Communication, AI Scaffolding, Socio-Emotional Learning

12:30-13:00 Discussion

13:00-14:30 🍴 LUNCH 🍴 (Provided, please register)

Session n°3: Teachers' Open Session

14:30-15:30 Beata Walesiak
unpolish.pl, Poland
Slides for participants only

Apps for L2 pronunciation training

In this talk, educators will learn about the pedagogical use of commercially-available pronunciation and speech coaching apps, with a focus on the features and functionalities they include and the way they can be integrated into teaching and learning practices (in classroom instruction and in learners’ self-study). At the same time, the talk will critically examine common promises made by app developers, such as efficiency or judgement-free feedback, and discuss app limitations as well as implications of positioning AI as an authoritative evaluator of learner performance. The aim of the session is to help teachers make informed, context-sensitive choices when using the apps in pedagogy.

Bio
Beata Walesiak is a lecturer for Open University at University of Warsaw (UOUW) and Language Science and Technology (LST) at the Institute of Applied Linguistics at University of Warsaw, Poland. She’s also a teacher trainer, linguist and researcher with unpolish.pl. She has cooperated with a number of schools, academic institutions and start-ups within the domain of educational technologies, mobile and computer assisted pronunciation training, and AI-based speech pedagogy and assessment. She is also a dedicated IATEFL Pronunciation Special Interest Group (PronSIG) Committee member.

15:30-16:00 Sylvain Coulange, Pinxun Huang & Eli Stafford
Univ. Grenoble Alpes, France
Univ. Lorraine, France,
Univ. Paris Cité, France
View presentation slides

Designing a Speaking Assessment Module for the SELF Language Placement Test

This talk presents the development of an automated speaking assessment module for SELF, the online language placement test at Université Grenoble Alpes (https://self.univ-grenoble-alpes.fr/english/). The project, known as SELF Production Orale (SELF PO), is a collaborative effort between the Laboratoire de Linguistique et Didactique des Langues Étrangères et Maternelles (LIDILEM) and the Laboratoire d'Informatique de Grenoble (LIG). Two speaking modules are currently under development: one for English and one for French, with the French module developed in partnership with CUEF and ADCUEFE.

This talk will focus on the English speaking module, covering the development phases, an overview of the speaking tasks and assessment criteria, and preliminary results. We will conclude with a discussion of the key challenges and limitations encountered in implementing automated speaking assessment at scale, offering insights for institutions pursuing similar initiatives.

16:00-16:30 Discussion

16:30-17:30 ☕ SOCIAL BREAK ☕

FRIDAY, MARCH 6TH

Session n°4: Intelligibility

9:00-9:30 Dan Frost
Univ. Grenoble Alpes, France
View presentation slides

Addressing language-specific needs: what makes learner speech intelligible and how can we assess it?

Over the last 25 years, the focus of pronunciation teaching has increasingly moved away from teaching towards “native speaker norms” towards teaching for intelligibility, or as Levis (2005; 2020) puts it, the “nativeness principle” vs. “the intelligibility principle”. While this is a noble aim for the majority of learning situations, what makes learner speech more or less intelligible is still very much up for debate. Much of my work over the past ten years has been an attempt to better understand the nature of intelligibility and its relationship to comprehension, particularly in the context of French learners of English. To this end, we developed a set of descriptors (Frost & O’Donnell, 2018). The descriptors were initially created following the longitudinal ELLO project (Frost & O’Donnell, 2015), where we identified that the original CRFR phonological control descriptors (Council of Europe, 2001) lacked the necessary precision to address the language-specific needs of our learners. While the Companion Volume to the CEFR (Council of Europe, 2020) has gone further in recognizing the importance of prosodic features, its common, universal nature still fails to address language-specific needs of learners.

The Prosody Descriptors we developed are an attempt to address language specificity, and serve a dual function. First, they enable an accurate assessment of English pronunciation aspects that are particularly problematic for French speakers, focusing on features that significantly impede intelligibility. Second, they function as a practical pedagogical tool, allowing both learners and teachers to establish clear, actionable goals for pronunciation instruction. Although calibrated for French speakers, the features targeted by these descriptors are valid across all learners of English. The tool has since been deployed and validated in several subsequent studies (Frost, 2021; 2022; forthcoming; Vézien & Frost, forthcoming), confirming its utility and accuracy.
This presentation will explore the nature of intelligibility, the relationship between perception and production, particularly relating to pronunciation, and specifically in relation to the pronunciation of English by French learners. I will outline how these questions informed the development of the descriptors, and how they continue to inform my work as I try to better understand how to help my learners understand English and make themselves understood in a variety of both national and international contexts in English.

References
Council of Europe. (2001). A Common European Framework of Reference for learning, teaching and assessment. Cambridge University Press.
Council of Europe. (2020). A Common European Framework of Reference for learning, teaching and assessment. Companion Volume. Council of Europe Publishing.
Frost, D. (forthcoming) Pronunciation assessment: Deconstructing intelligibility and setting learning objectives. La clé des langues.
Frost, D. (2022). Doing pronunciation online: An embodied and cognitive approach, which puts prosody first. RANAM (Recherches Anglaises et Nord-AMéricaines), 55/2022, 11–28.
Frost, D. (2021). Prosodie, Intelligibilité et compréhensibilité : l’évaluation de la prononciation lors d’un stage court. Les Langues Modernes, 3(2020): 76-90.
Frost, D. et O’Donnell, J. (2018). Evaluating the essentials, the place of prosody in oral production. Dans J. Volín, (2018). (Ed.), The Pronunciation of English by Speakers of Other Languages. Cambridge: Cambridge Scholars Publishing. 228-259. ISBN: 1-5275-0390-9
Frost, D. et O’Donnell, J. (2015). Success: B2 or not B2, that is the question (the ELLO project - Etude Longitudinale sur la Langue Orale). Recherche et pratiques pédagogiques en langues de spécialité – Cahiers de l’APLIUT,34(2). https://doi.org10.4000/apliut.5195
Levis, J. M. (2005). Changing Contexts and Shifting Paradigms in Pronunciation Teaching. TESOL Quarterly, 39(3), 369–377. https://doi.org/10.1075/jslp.20050.lev
Levis, J. (2020). Revisiting the Intelligibility and Nativeness Principles. Journal of Second Language Pronunciation, 6(3), 310–328. https://doi.org/10.1075/jslp.20050.lev

Vezien, S & Frost, D. 2026 (In preparation) Talking Heads: Improving pronunciation with text-to-speech software. La clé des langues.

9:30-10:00 Kevin Hirschi
Univ. of Texas San Antonio, USA
Slides for participants only

Towards automated measurement and feedback of L2 intelligibility: Challenges and a pedagogically informed roadmap

Second language (L2) intelligibility represents a precursor for communication in which sounds, words, or phrase are understood by a listener. Therefore, a comprehensive understanding of what constitutes intelligibility in speech—and what causes loss of intelligibility—can provide insights into the development of L2 proficiency, inform L2 learning curricula, and create parameters for effective assessment and feedback. Focusing on L2 English in the North American academic context, this presentation begins with a review of research on linguistic features associated with intelligibility (e.g., Kang et al., 2018, 2020), as well as their complex, nonlinear predictive power across listener backgrounds (Hirschi et al., 2023, 2025; Shekar et al., 2023). I then review alignment of audio LLMs with listeners through the lens of intelligibility, analyzing divergence from listeners and relating these issues to model bias (Hirschi & Kang, 2024; Kang & Hirschi, 2025).

With an understanding of the challenges of aligning machine listening with human comprehension, I argue that the central goal of designing automated measurement and feedback solutions for L2 intelligibility starts from pedagogy informed by theory and research, rather than technological capacity. As such, I will focus the remainder of the presentation on the theoretical tenets and research-informed practices which can guide the design of automated measurement and feedback of L2 intelligibility for inclusive, effective, and sustainable L2 learning. Drawing from the social nature of language and Sociocultural perspectives (Vygotsky, 1987), automated measurement and feedback theoretically provide learners scaffolding and a stress-free simulation of interaction. Interactionist literature on feedback further informs the construction and delivery of automated feedback (Long, 1996), and outlines measurement that is most relevant for learning. Furthermore, learner agency and proactive behavior explain why and how some learners independently make more progress with automated learning tools, offering insights into differential interventions for important individual differences (Duff, 2012; Papi, 2025). I will conclude by presenting an early effort in implementing pedagogically informed automatic feedback (Hirschi et al., 2025) as a proof of concept and potential roadmap for L2 intelligibility measurement and development.

10:00-10:30 Joan Carles Mora
Univ. de Barcelona, Spain
View presentation slides

Measures of acoustic and perceptual contrastiveness and nativelikeness in assessing segmental pronunciation development

Assessing the development of L2 pronunciation at the segmental level is a methodological challenge, especially after short phonetic training interventions (e.g. 4 30-minute high-variability phonetic training sessions focusing on one target contrast) or short L2 pronunciation pedagogical interventions (e.g. a few sessions of pronunciation-focused task-based teaching) where the size of improvement is expected to be small (. Still, pronunciation assessment is crucial to be able to evaluate the effectiveness of different phonetic training techniques and pedagogical approaches to pronunciation instruction. For example, Saito & Plonsky (2019) meta-analysed 77 pronunciation teaching studies and found that those assessing pronunciation through acoustic measures focusing on specific speech dimensions (e.g. VOT, formant frequencies) in controlled elicitation tasks (e.g. read aloud) found pronunciation instruction to be more effective than those assessing pronunciation through perceptual judgments focusing on global dimensions (e.g. comprehensibility) in spontaneous speech (e.g. monologic oral narrative task. Since segmental pronunciation training and instruction typically focuses on challenging L2 vowel and consonant contrasts (e.g. /r/-/l/ for Japanese learners of English; [ð]-[ɾ] for English learners of Spanish; /iː/-/ɪ/ for Spanish learners of English) with a high functional load and potentially having a detrimental impact on L2 speech intelligibility, it is important to address the following pronunciation assessment issues in relation to improvement in segmental contrasts resulting from phonetic training or pronunciation instruction:

(1) Should improvement be measured in terms of contrastiveness, nativelikeness, or both?
(2) Should contrastiveness / nativelikeness be measured acoustically or perceptually? If perceptually (by humans), how? Rating tasks, discrimination and identification tasks, intelligibility tasks?
(3) To what extent is improvement in segmental contrasts measurable in spontaneous speech? How can we evaluate the impact of segmental learning (and phonetic training and pronunciation instruction focusing on segmental contrasts) on L2 speech intelligibility?

The overall aim of this talk is to stimulate discussion about the implications of the answers to these questions for the automated assessment of L2 pronunciation (at the segmental level) and the evaluation of segmental pronunciation features having an impact on L2 speech intelligibility.

References
Saito, K., & Plonsky, L. (2019). Effects of second language pronunciation teaching revisited: A proposed measurement framework and meta‐analysis. Language Learning, 69(3), 652-708.

10:30-11:00 Discussion

11:00-11:30 ☕ BREAK ☕

Session n°5: Interactions

11:30-12:00 Serge Bibauw & Zhaori Wang
Univ. Catholique de Louvain, Belgium
KU Leuven
View presentation slides

Conversational AI for spoken L2 development: meta-analysis of effectiveness studies and insights for assessment

Abstract to be added soon.

12:00-12:30 Tsuneo Kato
Doshisha Univ., Japan
Slides for participants only

Effect of Prompt Corrective Feedback and Analysis of Error Patterns in Learning Syntactic Form with Trialogue-based CALL System

A substantial amount of form-focused practice is necessary for second language (L2) learners who are transitioning from answering questions in a few words to answering in a sentence, and for those expanding their expressions. We are developing a computer-assisted language learning system that focuses on learning a syntactic form through conversing with two computer characters. The trialogue-based CALL system promotes a learner’s implicit learning of the focused form by first demonstrating a model conversation between the characters, then asking the learner similar questions. With recent advancements in automatic speech recognition (ASR) and natural language processing (NLP), we added a simple prompt corrective feedback (CF) function for the learner’s answer using Whisper ASR and GPT-4o. We conducted a comparative experiment in which two groups of Japanese university students practiced English inanimate subject construction with and without the CF and took pre-, post-, and three retention tests over a period of up to 100 days. The experimental results showed a significant effect of the CF in all the post- and retention tests. To improve appropriateness of the CF, we further developed an automatic classifier of the errors that the learners make into global errors that hinder communication and local ones that do not. The accuracy of the classifier was measured by comparing it with manual classification by native speakers of English. The accuracy improved with prompt engineering to a large language model (LLM).

12:30-13:00 Mayuko Aiba & Nobuaki Minematsu
Univ. of Tokyo, Japan
View presentation slides 1
View presentation slides 2

LLM-based interaction for academic language learning: three case studies

Large language models (LLMs) are increasingly used to support academic language learning, yet their educational value depends critically on how interaction is designed and situated. This presentation reports three case studies exploring LLM-based interaction for supporting students’ academic communication in higher education.

The first study investigates a GPT-based oral Q&A simulation system designed to help students prepare for their first international conference. By generating realistic questions based on students’ own papers and enabling spoken interaction, the system provides scalable opportunities to practice academic Q&A without intensive instructor involvement.

The second study focuses on feedback after academic Q&A sessions. We propose a BI-R framework that extends the Belief–Desire–Intention (BDI) model by explicitly incorporating Respect as a guiding principle for feedback generation. Experimental results suggest that while deep mental-state reasoning alone does not always outperform baseline approaches, feedback that embeds social sensitivity can be particularly effective for certain question types and learner characteristics.

The third study, LangInLab, explores situated interaction in engineering education by integrating vision- and voice-enabled AI agents into laboratory classes. Through role-based multimodal interaction, students practice technical English within authentic experimental contexts.

Together, these case studies illustrate how carefully designed LLM-based interactions can enhance academic language learning across diverse educational settings.

Keywords:
LLM-based Interaction, Spoken Academic Communication, Academic Language Learning

References (for the three case studies)
Aiba, M., Saito, D., & Minematsu, N. (2025). GPT-based simulation of oral Q&A to support students attending first conference. JALTCALL Trends, 1(1), 2163. https://doi.org/10.29140/jct.v1n1.2163
Aiba, M., Saito, D., & Minematsu, N. (2026) Incorporating Respect into LLM-Based Academic Feedback: A BI-R Framework for Instructing Students after Q&A Sessions, Proc. IWSDS (to appear)
Shigi, M., Rackauckas, Z., Akiyama, Y., & Minematsu, N. (2025) LangInLab: Augmenting Engineering Lab Instruction with Vision-and Voice-Enabled AI Agents for Language Learning, Proc. Human-Agent Interaction 2025.

13:00-13:30 Discussion

13:30-15:00 🍴 LUNCH 🍴 (Provided, please register)

Session n°6: LLM-Based Assessment

15:00-15:30 Nicolas Ballier
Univ. Paris Cité, France
View presentation slides

What's in the L2 speech signal? Calibrating Whisper probability scores with phonetic posteriorgrams

This paper presents a method to automatically compare the probabilities assigned by Whisper, which can be used as L2 speech scoring (Ballier et al., 2024) to the phoneme distribution probabilities assigned by Phonetic Posteriorgrams (PPGs, Morrisson et al. 2024).

Whisper is a speech foundational models that has been trained over 90 languages to transcribe speech into texts. It can be used to predict the spoken language and to do automatic speech recognition (ASR). A standard metric for the analysis of the quality of the transcription is word error rate (WER), which computes the distance between the Whisper transcription and the text actually pronounced. This requires a transcription but we investigate a textless method based on the acoustic prediction of what is actually in the signal, using phonetic Posteriograms (PPGs). A PPG (Morrison et al. 2024) is a time-based categorical distribution over acoustic units of speech (usually phonemes). This type of representation has been used to disentangle pronunciation features (Churchwell 2024, Morrison et al 2024) and to provide an interpretable representation in terms of phone categories. Several models have been trained over the TIMIT dataset to produce the PPGs. The assumption is that the signal can be interpreted in terms of phonemic realization, depending on the type of categories used in the training data (typically IPA symbols or TIMIT transcription conventions). For example, in the ppgs library (Churchwell et al.,2024), 42 categories have been produced, and for a given portion of the speech signal a probability for the 42 categories can be assigned to the phone realization. Usually, a topK probability method is retained, the highest probability corresponds to the phone actually predicted.

We compare these phone probabilities assigned by posteriorgrams to the Whisper transcriptions, using the token level (Liang et al., 2025, Ballier et al.,2024) obtained for the Whisper transcriptions. (Whisper transcriptions rely on a tokenization corresponding to a specific algorithm, see Ballier et al. 2024 for details).

This paper therefore aims at comparing the Whisper predictions at token level (syllables, pseudo-syllable or words) and the corresponding interpretation in terms of probability of the phonemic distributions of the corresponding portion of the signal. We discuss the alignment issues, the diffrent sizes of the time frames and the implementation methods. Several implementations of the stereograms exist with a 20 milliseconds frame time frame which means that the probability distribution is assigned for a 20 millisecond portion of the signal. We discuss the possible methods that can be used when the segment to be analyzed is over 20 milliseconds. We report preliminary investigations on the ISLE data.

15:30-16:00 Stefano Bannò
Univ. of Cambridge, UK
View presentation slides

Natural Language-based Assessment of L2 Oral Proficiency using LLMs

Natural language-based assessment (NLA) is an approach to second language assessment that uses instructions - expressed in the form of can-do descriptors - originally intended for human examiners, aiming to determine whether large language models (LLMs) can interpret and apply them in ways comparable to human assessment. In this work, we explore the use of such descriptors with an open-source LLM, Qwen 2.5 72B, to assess responses from the publicly available S&I Corpus in a zero-shot setting. Our results show that this approach - relying solely on textual information - achieves competitive performance: while it does not outperform state-of-the-art speech LLMs fine-tuned for the task, it surpasses a BERT-based model trained specifically for this purpose. NLA proves particularly effective in mismatched task settings, is generalisable to other data types and languages, and offers greater interpretability, as it is grounded in clearly explainable, widely applicable language descriptors.

16:00-16:30 Luis Da Costa & Laura Rupp
Vrije Univ. Amsterdam, the Netherlands
View presentation slides

Perspectives for Large Scale Speaking Assessment within the Mooc “English Pronunciation in a Global World”

In this talk, we present ongoing work on building a learner corpus of spoken English derived from a large Massive Open and Online Course (MOOC), English Pronunciation in a Global World. We describe our design choices and annotation strategies, highlighting key insights gained from collecting and analyzing learner speech data at a large-scale. In the second part of the talk, we discuss experiments using Automated Speech Recognition (ASR) models to support speech assessment. These experiments explore zero-shot ASR performance, ensemble approaches that combine multiple models, and continued pretraining to enhance accuracy and robustness. Together, these efforts aim to advance scalable, data-driven approaches to spoken language learning and assessment.

16:30-17:00 Discussion

17:00-17:30 CLOSING (Sylvain Coulange, Univ. Grenoble Alpes, France)

Printable version of the program

Unable to display PDF file. Download instead.

Contact

Sylvain Coulange

sylvain.coulangeuniv-grenoble-alpes.fr (sylvain[dot]coulange[at]univ-grenoble-alpes[dot]fr)

En savoir plus sur Automated L2 Speaking Assessment (AL2SA) International Workshop 2026

Automated L2 Speaking Assessment

Conférence, Rencontre / Débat Innovation pédagogique, Séminaire transversal Le 11 avril 2025

9:00 - 13:00

Saint-Martin-d'Hères - Domaine universitaire

Conference Room – Médiat Rhône Alpes (Google Maps)

The LIDILEM and the Service des Langues are jointly organizing a one-day event on automated assessment of L2 spoken production. This meeting will be an opportunity to present ongoing and future projects and to discuss our research challenges.

The event is open to all, free of charge, and does not require registration. It will be followed by a buffet lunch provided by the Service des Langues and SELF Innovalangues (reservation required before March 31).

Program

📅 Friday, April 11
📍 Conference Room – Médiat Rhône Alpes (Google Maps)
🕒 30 min per speaker (including project presentation & Q&A)

☕ 9:00 – 9:30 | Welcome Coffee

🎤 9:30 – 11:00 | Ongoing and Future Projects (1)

Sylvain Coulange (LIDILEM/LIG, UGA) – Design and Implementation of a Score Prediction Model for L2 Spontaneous Speech in SELF Placement Test
Marco Dinarelli (LIG, UGA) – Presentation of the JANUS Project: Large Multi-modal Language Models for L2 Language Acquisition
Tsuneo Kato (SLPL, Doshisha University) – Learning Second Language Expression with Form-Focused Trialogue-Based CALL System

☕ 11:00 – 11:30 | Coffee Break

🎤 11:30 – 13:00 | Ongoing and Future Projects (2)

Laura Rupp (Centre for Global English, Vrije Universiteit Amsterdam) – Developing an Automated Pronunciation Checker that Assesses the Intelligibility of English Pronunciation
Nicolas Ballier (ALTAE, Université Paris Cité) – Whisper for L2 Scoring: From Segmental to Suprasegmental Features
Antonio Romano (Lab. di Fonetica Sperimentale “Arturo Genre”, Torino University) – Assessing the Prosodic Components of L2 Speech Using a Chatbot

🍽 13:00 – 14:30 | Lunch Buffet (provided by Service des Langues and SELF Innovalangues – Register Here)
📍 Room Magellan (Google Maps)

Practical Information

📍 Venue & Access

📍 Conference Room – Médiat Rhône-Alpes (Google Maps)

🚆 Access from Train Stations

From Grenoble Gares: Take Tram B (25 min) to Université – Condillac (Direction: Gières Plaine des Sports)
From Gières Gare – Université: Take Tram B (5 min) to Université – Condillac (Direction: Oxford)

🌍 Remote Participation

We encourage all participants to attend in person, but for those unable to travel to Grenoble, here is a Zoom link for remote access:

https://univ-grenoble-alpes-fr.zoom.us/j/98433223399?pwd=dZw39oBKqhQltsa5e41bc45QQYr11C.1

Meeting ID: 984 3322 3399

Passcode: 659265

Contact

Sylvain Coulange

sylvain.coulangeuniv-grenoble-alpes.fr (sylvain[dot]coulange[at]univ-grenoble-alpes[dot]fr)

En savoir plus sur Automated L2 Speaking Assessment

Journée d'Accueil des Doctorant-e-s du Lidilem 2022

JAD, Rencontre / Débat Communauté_doctorante, Vie de l'établissement Le 11 mars 2022

9h00 - 18h00

Saint-Martin-d'Hères - Domaine universitaire

Salle D205 bis du batiment Stendhal
Possibilité par Zoom

Matinée (9h-12h)
« J’y comprends rien comment qu’ça fonctionne au LIDILEM » : structures et fonctionnements

____________________________________________

Accueil en Vrai-rtuel et Mot de Bienvenue VENEZ AVEC VOTRE PLUS BEAU PULL ! (20 min)
Iris Fabry, Alexis Ladreyt, Roxanne Comotti (Représentant·es des Doctorant·es du LIDILEM)

Le LIDILEM, l’UGA et ses instances : Structure et Fonctionnement (30 min)
Iva Novakova, Jean-Pascal Simon (Dir. du Lidilem), Jérôme Barona (Responsable administratif), Halima Bouhtala (doctorante)

Ressources pour les doctorant·es (30 min)
Isabelle Rousset (Ingénieure de recherche)

CEDIL22, Sessions d’écriture et journées Paren(Thèse), Partage des bureaux, Association Thèse Emoi (30 min)
Multiples présentateurices

PAUSE (10 min)

Introduction aux cadres juridiques dans les corpus à données sensibles (1H)
Intervention de Laurence Durroux (Enseignante-Chercheure | Lidilem)

Pause déjeuner (12h-14h)
Ripaillons !
____________________________________________

Peut-être un pique-nique ensemble en fonction des règles sanitaires mises en place ; c’est parti pour l’enjaille !

Après-midi (14h-18h)
« Wahou c’trop cool c’qu’on fait au LIDILEM » : travaux et ateliers collaboratifs
____________________________________________

Présentation Action Genre, Sexes et Sexualités (40 min)
Intervention de Martine Pons (Enseignante-Chercheure | Lidilem)

PAUSE (10 min)

Atelier FAQ des doctorant·es du LIDILEM (1H30 min)
Manon Boucharéchas, Roxanne Comotti, Iris Fabry (doctorantes)

PAUSE (10 min)

Atelier association doctorant·e : présentation du Litthésarts et prise de décision collective (maximum 1H30)
Litthésarts et Alexis Ladreyt (doctorant·es)

Sortie du soir (à partir de 18h)

Allons licher une bonne pinte bien méritée et s’amuser comme jaja ! Lieu à définir

Programme (PDF, 1.28 Mo)

Contacts

elus-doctorants-lidilemuniv-grenoble-alpes.fr (elus-doctorants-lidilem[at]univ-grenoble-alpes[dot]fr)
Page Facebook : https://www.facebook.com/cdlidilem/
Compte Twitter : https://twitter.com/cdlidilem
Groupe WhatsApp “Doctorant.e.s du Lidilem”

En savoir plus sur Journée d'Accueil des Doctorant-e-s du Lidilem 2022

Journées de la francophonie 2021

Journée francophonie, Rencontre / Débat Communauté_doctorante Du 18 mars 2021 au 20 mars 2021

Depuis plusieurs années, la communauté doctorante du Lidilem contribue à la conception, à l’organisation et à la mise en œuvre des Journées de la Francophonie (ancienne Semaine de la Francophonie) organisée par l’UGA.

Fruit d’un partenariat de plusieurs années avec le CUEF de Grenoble et les RTI de l’UGA, cet événement thématique annuel sur le thème de la francophonie met la communauté doctorante à l’honneur à l’occasion d’une matinée scientifique grand public qui lui est réservée.

Structurée autour d’une conférence plénière et d’une table ronde constituée de doctorant⸱es et de spécialistes du domaine, ce rendez-vous incontournable permet à la communauté doctorante de valoriser les travaux de recherche de ses membres et de contribuer à leur intégration sur le site, tout en créant un appel vers des problématiques soulevant des enjeux sociétaux et scientifiques majeurs.

_______________________________________________________________________

Tables rondes, conférence, contes en musique, ateliers de lecture et de cuisine sont inscrits au programme que vous pouvez retrouver sur : www.univ-grenoble-alpes.fr/francophonie

En tant que personnels, vous pouvez notamment prendre part aux événements suivants :

Jeudi 18 mars
- Table ronde Patrimoine en Afrique : Réalités, potentiels et défis
- Atelier de lecture Lire en délires : voyages inattendus dans la littérature africaine contemporaine
- Contes anciens de l’espace Mossi racontés en français et en musique
Vendredi 19 mars
- Conférence Dynamique des contacts de langues dans les pays francophones d’Afrique
- Table ronde L’impact des langues locales dans la construction des francophonies africaines
- Atelier cuisine Gastronomie d’Afrique : un plat cuisiné à découvrir en direct
Samedi 20 mars
- Contes en musique Promenade linguistique francophone au Maroc à travers plusieurs contes régionaux
- Atelier cuisine Gastronomie d’Afrique : un plat cuisiné à découvrir en direct
Jeudi 18 mars au vendredi 16 avril
- Exposition multimédia autour des mondes africains : Le grain de la parole

Toute l’équipe d’organisation vous souhaite une belle fête de la francophonie !

Programme (PDF, 4.1 Mo)

Contacts au sein du Lidilem

laurence.delperieuniv-grenoble-alpes.fr (Laurence Delperié)
Grâce Bosse
Wendingoudi Emile Ouedraogo

En savoir plus sur Journées de la francophonie 2021

Mois du Canada 2021 : Visionage du film 'The blinding sea'

Rencontre / Débat Du 22 mars 2021 au 29 mars 2021

Chaque année les membres du Centre d’études canadiennes de l’UGA proposent des manifestations culturelles accessibles à tout public. Certain·es chercheuses et chercheurs du Lidilem sont rattachés à ce centre à travers leurs projets et en lien avec de nombreuses universités canadiennes (universités de Montréal, Laval, Moncton, Ottawa, Vancouver…).

Le Centre d'Etudes Canadiennes de Grenoble ouvre les mois du Canada 2021 par la diffusion en streaming du 22 au 28 mars du film The blinding sea VO en anglais sous-titrée en français. Le film sera suivi le lundi 29 mars à 18h d’une visio-conférence avec l’auteur, George Tombs.

Le film retrace la carrière du célèbre explorateur norvégien Roald Amundsen qui fut le premier à traverser le passage du Nord-Ouest et à atteindre le pôle sud. Tourné à travers l’Alaska, l’archipel arctique canadien, la mer de Beaufort , l’océan austral et l'océan antarctique, le film souligne la tension entre les connaissances traditionnelles inuit et le paradigme scientifique européen. Artiste, historien, journaliste, docteur en histoire de l’Université McGill, George Tombs est un passionné de l'arctique et de son environnement.

Un film documentaire passionnant sur l’explorateur Roald Amundsen

Bande annonce et inscription

Lien

En savoir plus sur Mois du Canada 2021 : Visionage du film 'The blinding sea'

S'abonner à Rencontre / Débat