🤖 EMAI

Overview 💡

The "Embodied AI: Exploring Trends, Challenges, and Opportunities" workshop, set to convene at ICIP 2024 in Abu Dhabi, UAE, serves as a broad-based platform for delving into the convergence of Embodied AI with critical disciplines, including computer vision, language processing, graphics, and robotics. Aimed at enhancing comprehension of AI agents' skills in perception, interaction, and logical reasoning within varied environments, the workshop encourages a cross-disciplinary exchange among premier researchers and industry figures. Participants can look forward to a rich program featuring thought-provoking talks by distinguished experts, a presentation session of the latest research, and dynamic discussions on the evolving landscape of smart, interactive technologies. This forum is poised to be a seminal event for those eager to influence and drive forward the progress in Embodied AI.

The dedicated workshop on Embodied AI is essential due to its unique focus on integrating physical embodiment with AI capabilities, addressing challenges and opportunities not fully explored in the main ICIP conference. It merges computer vision, language processing, and robotics, pushing beyond traditional boundaries to create agents that perceive, interact, and reason within their environments. This specialized forum encourages cross-disciplinary collaboration, fostering advancements that are vital for the development of intelligent, interactive systems, and addressing the gap between current image processing techniques and the future needs of AI research, including foundation models, robotics, and embodied intelligence.

The "Embodied AI: Exploring Trends, Challenges, and Opportunities" workshop at ICIP 2024 in Abu Dhabi, UAE, stands at the confluence of Embodied AI and pivotal areas such as computer vision, language processing, graphics, and robotics. This synthesis is poised to catalyze significant momentum in the field, by bringing the frontier of foundation models, robotics, and embodied AI to the research community.

Schedule ⏰ (tentative)

Time	Topic	Speaker
8:30 - 8:35	Opening Remark	Yi Fang
8:35 - 9:00	Embodied Visual Navigation	Yi Fang
9:00 - 9:30	Building Multilingual Multimodal Conversational Assistants	Hisham Cholakkal
9:30 - 10:00	Robot Imagination: Affordance Reasoning via Physical Simulation	Gregory Chirikjian
10:00 - 10:30	Coffee Break
10:30 - 11:00	Scene Understanding for Safe and Autonomous Navigation	Amit K. Roy-Chowdhury
11:00 - 11:30	From Video Understanding to Embodied Agents	Ivan Laptev
11:30 - 12:00	Visual Human Motion Analysis	Li Cheng
12:00 - 14:30	Lunch
14:30 - 15:00	Towards Efficient Vision-Language Navigation	Xiaojun Chang
15:00 - 15:30	Data-Centric Approaches to Advancing Embodied AI	Zhiqiang Shen
15:30 - 16:00	To Enable Multimedia Machines to Perceive and Act as Humans Do	Weisi Lin
16:00 - 16:30	Coffee Break
16:30 - 17:00	Large-scale Heterogeneous Scene Modelling and Editing	Dan Xu
17:00 - 17:30	Vision-Language Models and Robotics for Climate Action	Maryam Rahnemoonfar
17:30 - 18:00	Flexible Modality Learning: Modeling Arbitrary Modality Combination via the Mixture-of-Expert Framework	Tianlong Chen

Invited Speakers 🧑‍🏫

Yi Fang

Associate Professor, New York University

Dr. Yi Fang, is an Associate Professor of Electrical and Computer Engineering and an Affiliated Associate Professor of Computer Science at NYU and NYU Abu Dhabi, as well as a member of the Center for Artificial Intelligence and Robotics (CAIR) at NYU. After earning his doctorate from Purdue University with a focus on computer graphics and vision, he gained industry experience at Siemens and Riverain Technologies, and academic experience at Vanderbilt University. His research focuses on embodied AI, general-purpose robots, and humanoids, with applications spanning engineering, social science, medicine, and biology. Dr. Fang founded the NYU AIR Lab (Embodied AI and Robotics Lab), a leading center for research in robotics and AI.

Hisham Cholakkal

Assistant Professor of Computer Vision, Mohamed bin Zayed University of Artificial Intelligence

Dr. Hisham Cholakkal is an Assistant Professor at MBZUAI, having diverse experiences in fundamental research, teaching, and commercial product development across diverse industries. Prior to joining MBZUAI, he worked as a Research Scientist at the Inception Institute of Artificial Intelligence (IIAI) in Abu Dhabi. Before his role at IIAI, he served as a Senior Technical Lead in the Computer Vision and Deep Learning Research team at Mercedes-Benz R&D in India. He has also worked at the Advanced Digital Science Center (ADSC) in Singapore and at the BEL Central Research Lab in India. Cholakkal's research interests include multimodal models, LLMs/VLMs, visual recognition, and AI in healthcare. His recent focus is on building multimodal conversational systems capable of reasoning and interacting seamlessly with humans in real time. He is also interested in the real-world applications of computer vision and machine learning algorithms in healthcare and remote sensing. Cholakkal's research has received several recognitions and funding, including the Google Research Award 2023 at MBZUAI, Meta Llama Impact Innovation Award 2024, MBZUAI Seed fund 2024, Weizmann Institute of Science - MBZUAI Joint Research Grant 2022-2025, etc. Cholakkal will serve as a General Chair at ACM Multimedia Asia 2026 and has previously acted as Area Chair for ECCV 2024 and BMVC 2024. He was the Primary Organizer of workshops at ICCV 2023, CVPR 2024, NeurIPS 2022, and ACCV 2022. Additionally, he serves as an Associate Editor for journals such as IET Computer Vision and is a program committee member for several top conferences, including CVPR, ICCV, NeurIPS, ICLR, and ECCV.

Gregory S. Chirikjian

Professor & Department Chair, University of Delaware

Dr. Gregory S. Chirikjian is the Willis F. Harrington Professor and Chair of the Mechanical Engineering Department at the University of Delaware. A distinguished roboticist and applied mathematician, he is known for his groundbreaking contributions to robotics, particularly in kinematics, motion planning, and the application of group theory to engineering. His research has advanced the understanding of hyper-redundant robots and stochastic methods on Lie groups, and he is actively involved in embodied AI, focusing on affordance-based reasoning to enhance robotic intelligence. Chirikjian's career is marked by numerous honors, including being named an NSF Young Investigator, a Presidential Faculty Fellow, and a Fellow of both IEEE and ASME. Before joining the University of Delaware in 2024, he held leadership roles at the National University of Singapore

Amit K Roy Chowdhury

Professor and Director of UC Riverside AI Research and Education Institute, University of California, Riverside

Dr. Amit Roy-Chowdhury received his PhD from the University of Maryland, College Park (UMCP) in 2002 and joined the University of California, Riverside (UCR) in 2004 where he is a Professor and UC Presidential Chair of Electrical and Computer Engineering, Cooperating Faculty in Computer Science and Engineering, and Co-Director of the UC Riverside AI Research and Education Institute. He leads the Video Computing Group at UCR, working on foundational principles of computer vision, image processing, and machine learning, with applications in cyber-physical, autonomous and intelligent systems. He has published over 250 papers in peer-reviewed journals and conferences and two monographs: Person Re-identification with Limited Supervision and Camera Networks: The Acquisition and Analysis of Videos Over Wide Areas. He is on the editorial boards of major journals and program committees of the main conferences in his area. He is a Fellow of the IEEE and IAPR, received the Doctoral Dissertation Advising/Mentoring Award from UCR, and the ECE Distinguished Alumni Award from UMCP.

Ivan Laptev

Professor, Mohamed bin Zayed University of Artificial Intelligence

Dr. Ivan Laptev obtained his master's degree in computer science at the Royal Institute of Technology (KTH) in Sweden in 1997 and then worked as a research assistant at the Technical University of Munich. In 2004, he earned his Ph.D. in computer science from KTH and pursued a postdoc position at the INRIA Vista team in France. He was appointed as INRIA Research Scientist in 2005 and then as INRIA Research Director in 2013. He has been with INRIA Paris since 2009, where he has led the WILLOW research team between 2021 and 2023. He has published more than 150 technical papers, most of which appeared in international journals and major peer-reviewed conferences of computer vision, machine learning and robotics. He has graduated 19 Ph.D. students who now pursue careers in industrial and academic research labs. He has also co-founded a computer vision company, VisionLabs, which has grown to 250 people. Laptev has been actively involved in the scientific community, serving as an associate editor of IJCV and TPAMI, and as a program chair for CVPR 2018, ICCV 2023 and ACCV 2024. He will also serve as a General Chair of ICCV 2029 bringing the international computer vision community to UAE. He has co-organized several tutorials, workshops and challenges at major computer vision conferences. He has also co-organized a series of INRIA summer schools on computer vision and machine learning (2010–2013) and Machines Can See summits (2017–2023). He received an ERC Starting Grant in 2012 and was awarded a Helmholtz prize for significant impact on computer vision in 2017.

Li Cheng

Associate Professor, University of Alberta

Dr. Li Cheng is an associate professor with the Department of Electrical and Computer Engineering, University of Alberta. He also hold an adjunct position with A*STAR, Singapore, where he have led a group in Machine Learning for Bioimage Analysis at the Bioinformatics Institute. Prior to joining University of Alberta in year 2018, he worked at A*STAR, Singapore, TTI-Chicago, USA, and NICTA, Australia. He received my BSc degree in Computer Science from Jilin University in 1996, M. Eng. degree from Nankai University in 1999, and PhD in Computing Science from the University of Alberta in 2004. His research expertise is mainly on computer vision and machine learning. He is a member of the Institute of Electrical and Electronics Engineers (IEEE), the Association for Computing Machinery (ACM), and the Association for the Advancement of Artificial Intelligence (AAAI).

Xiaojun Chang

Professor, Australian Artificial Intelligence Institute (AAII) and Visiting Professor at Mohamed bin Zayed University of Artificial Intelligence

Dr. Xiaojun Chang is a Professor at the Australian Artificial Intelligence Institute (AAII) at UTS and a Visiting Professor at the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI). He directs the Recognition, Learning, and Reasoning Lab (ReLER), focusing on AI, computer vision, multimedia, and machine learning, particularly for analyzing visual, acoustic, and textual signals in applications like video surveillance. Before joining UTS in 2022, Dr. Chang held positions at Carnegie Mellon University, Monash University, and RMIT. He has secured over $3 million in research funding and made significant contributions to video analysis and multimedia retrieval, including healthcare innovations. A Clarivate Analytics Highly Cited Researcher (2019-2023), his work, including an automatic report generation system for critically ill COVID-19 patients, has gained international recognition. His team has won prestigious global challenges, and he has published over 200 peer-reviewed papers. Committed to advancing AI for real-world applications, Dr. Chang regularly collaborates with industry to develop intelligent systems that benefit humanity.

Zhiqiang Shen

Assistant Professor, Mohamed bin Zayed University of Artificial Intelligence

Dr. Zhiqiang Shen is an Assistant Professor of Machine Learning at MBZUAI, specializing in efficient deep learning, machine learning, and computer vision. His research focuses on developing deep learning methods for image recognition, object detection, and designing efficient architectures with parameter-efficient fine-tuning strategies. Prior to MBZUAI, Dr. Shen was an assistant research professor at Hong Kong University of Science and Technology (HKUST) and a postdoctoral researcher at CyLab, Carnegie Mellon University. His recent work includes low-bit neural networks, knowledge distillation, and efficient architectures for CNNs and transformers, with a focus on unsupervised learning and image understanding.

Weisi Lin

Associate Dean (Research), College of Computing & Data Science, Nanyang Technological University

Dr. Weisi Lin is a distinguished researcher and educator, holding a PhD in Computer Vision from King's College London, as well as a BSc in Electronics and MSc in Digital Signal Processing from Sun Yat-Sen University, China. He has held academic and research positions at institutions such as Sun Yat-Sen University, Bath University, and the National University of Singapore, as well as leadership roles in Singapore's Institute for Infocomm Research. With over 400 refereed publications, 16 patents, and contributions to international standards, Dr. Lin has led more than 10 major projects in digital multimedia technology. His research focuses on perception-inspired signal modeling, visual quality evaluation, video compression, and multimedia systems, balancing academic theory with industrial application.

Dan Xu

Assistant Professor, Department of Computer Science and Engineering, Hong Kong University of Sciences and Technology (HKUST)

Dr. Dan Xu, is an Assistant Professor in the Department of Computer Science and Engineering at HKUST, with a research focus on computer vision, multimedia, and machine learning. He was previously a Postdoctoral Research Fellow in the Visual Geometry Group at the University of Oxford, working under Prof. Andrea Vedaldi and Prof. Andrew Zisserman, and earned his PhD from the University of Trento under Prof. Nicu Sebe. Dr. Xu's research interests include deep learning, multi-modal and multi-task learning, with applications in 2D/3D perception, scene understanding, dense scene prediction, and large-scale 3D modeling, as well as human- and scene-centric generation and editing.

Maryam Rahnemoonfar

Associate Professor, Director of Computer Vision and Remote Sensing Laboratory (Bina lab), Lehigh University

Dr. Maryam Rahnemoonfar is a Tenured Associate Professor of Computer Science and Engineering at Lehigh University's P.C. Rossin College of Engineering and Applied Science, with a joint appointment in Civil and Environmental Engineering. She directs the Computer Vision and Remote Sensing Laboratory (Bina Lab), where her research spans Data Science for Sustainability, Deep Learning, Computer Vision, AI for Social Good, and Remote Sensing. Her work focuses on developing machine learning algorithms for heterogeneous sensors such as Radar, Sonar, and Multi-spectral. Dr. Rahnemoonfar has secured multiple prestigious awards, including the NSF HDR Institute Award and Amazon Machine Learning Award. Passionate about interdisciplinary research for environmental and humanitarian solutions, she has led numerous projects and served on the National Academy of Sciences' workshop on Antarctic research technologies. She earned her Ph.D. in Computer Science from the University of Salford, UK, and previously held academic positions at UMBC and Texas A&M University-Corpus Christi.

Tianlong Chen

Assistant Professor, University of North Carolina at Chapel Hill

Dr. Tianlong Chen received the Ph.D. degree in Electrical and Computer Engineering from University of Texas at Austin, TX, USA, in 2023. He starts as an Assistant Professor of Computer Science at The University of North Carolina at Chapel Hill in Fall 2024. Before that, he is a Postdoctoral Researcher at Massachusetts Institute of Technology (CSAIL@MIT), Harvard (BMI@Harvard), and Broad Institute of MIT & Harvard in 2023-2024. His research focuses on building accurate, trustworthy, and efficient machine learning systems. He devotes his most recent passion to various (A) important machine learning problems - sparsity, robustness, learning to optimize, graph learning, and diffusion models; (B) interdisciplinary scientific challenges - bioengineering and quantum comptuing. He received IBM Ph.D. Fellowship, Adobe Ph.D. Fellowship, Graduate Dean's Prestigious Fellowship, AdvML Rising Star, and the Best Paper Award from the inaugural Learning on Graphs (LoG) Conference 2022.

Call for Papers 📝

We warmly invite submissions of high-quality research papers, not exceeding 4 pages (excluding references), that focus on the following themes of Embodied AI:

Embodied Intelligence Through World Models

Embodied Navigation

Embodied Manipulation

Embodied AI for Multimodal Processing

Visual Rearrangement

Foundation Models for Embodied AI

Sim to Real Transfer

AI with Human Interaction

Generative Model for Embodied AI

Simulation Environments

Embodied Question Answering

Selected papers will earn the opportunity for presentation in the form of either posters or spotlight talks during the workshop. Additionally, these papers will be published and made accessible via IEEE Xplore, adhering to the ICIP's guidelines for workshop contributions. Please note, as per ICIP regulations, at least one author from each accepted paper is required to complete an in-person registration for the conference.

The submission deadline is April 25, 2024 (Anywhere on Earth). Papers should be no longer than 4 pages (excluding references) and styled in the ICIP format.

Submission Link : Link