Palestras e Seminários

03/12/2025

16:00

Auditório Luiz Antônio Fávaro

Palestrante: Aline Villavicencio

Responsável: Thiago Pardo (taspardo@icmc.usp.br)

Salvar atividade no Google Calendar

  • Resumo: "Large language models have been successfully used for capturing distinct (and very specific) word usages, and therefore could provide an attractive alternative for accurately determining meaning in language. However, these models still face a serious challenge when dealing with non-literal language, like that involved in Multiword Expressions (MWEs) such as idioms (make ends meet), light verb constructions (give a sigh), verb particle constructions (shake up) and noun compounds (loan shark). MWEs are an integral part of the mental lexicon of native speakers often used to express complex ideas in a simple and conventionalised way accepted by a given linguistic community. Although they may display a wealth of idiosyncrasies, from lexical, syntactic and semantic to statistical, that represents a real challenge for current LLMs, their accurate integration has the potential for improving the precision, naturalness and fluency of many tasks. In this talk, I will present an overview of how advances in LLMs have made an impact for the identification and modelling of idiomaticity and MWEs. I will concentrate on what models seem to incorporate of idiomaticity, as idiomatic interpretation may require knowledge that goes beyond what can be gathered from the individual words of an expression (e.g. “dark horse” as an unknown candidate who unexpectedly succeeds). I will also present an initiative to construct a multilingual idiomatic dataset."

CONECTE-SE COM A GENTE
 

© 2025 Instituto de Ciências Matemáticas e de Computação