
Introduction to Vision-Language Modeling: Challenges and Applications in Technology
Following the popularity of Large Language Models (LLMs), attempts have been made to extend them to the visual domain. Vision-language model (VLM) applications, from visual assistants to generative models, will impact our relationship with technology. Challenges include the high-dimensional nature of vision. This introduction explains VLMs, their training, evaluation, and potential extension to videos.