Leveraging Natural Language Processing and Large Language Models for Research Exploration and Data Analysis
Dates: 12th July, 13th and 15th July ‘ 2024
Course Introduction / Description:
Generative AI is a type of artificial intelligence that can create and generate new content. As part of Generative AI, Large Language Models [LLMs] are deep learning- based transformer architectures (e.g., GPT-4/Generative Pre-trained Transformer-4), which are considered a significant breakthrough in the Natural Language Processing (NLP) and AI field and have shown a substantial potential to transform organizations and society in several ways. An example of an LLM is ChatGPT, which has recently gained widespread attention for its exceptional language generation skills and has demonstrated tremendous capabilities across various domains and tasks such as question-answering and passing examinations (such as Uniform Bar Exam, etc.), thereby even challenging our wisdom and cognition.
The primary purpose of this course is to provide knowledge and a deep understanding of various concepts, techniques, and methods that serve as a foundation for LLMs such as ChatGPT. Starting from the basic NLP concepts, this course will delve into deep learning architectures for NLP and Generative AI and then into applications and data analysis using LLMs. Subsequently, the course will focus on the opportunities, challenges, and risks associated with these Generative AI models like ChatGPT and their implications for organizations and society. The following is the outline for the course.
The course is mainly designed for PhD students who want to use NLP and text analysis in their research using LLMs. It also contains hands-on exercises on these topics using the Python programming language. The PhD students are expected to have some basic understanding of either Python or R programming languages and some familiarity with running Python scripts using Jupyter Notebooks. The following is the course outline.
- First, the course starts with some fundemnetal concepts of machine learning (ML) and NLP. Additionally, it focuses on using these techniques for data analysis, using supervised and unsupervised approaches, such as text classification and topic
- Second, it presents the high-level architectures of deep learning, generative models, and LLMs and elaborates on why LLMs like ChatGPT have achieved so many analytical
- Third, it will provide a detailed account of these models' capabilities, possible applications to various fields, and how they will impact society and organizations in the
- Fourth, it will present how LLMs can be used for text analysis by examining some of the techniques, such as text summarization, text classification and code generation.
- Finally, it will discuss these models' diverse societal impacts and challenges, especially in terms of inequity, misuse, and legal and ethical
Course Learning Outcomes:
After completing this course, the participants should be able to:
- Demonstrate the fundamental understanding of NLP and how they can be used for the analysis of text
- Explain the fundamental principles of generative AI and LLMs and how they can be used for data
- Compare various approaches to using Generative AI and LLMs, demonstrating their practical relevance through real-world applications and case
- Describe the key challenges and opportunities, including issues related to reliability, hallucination, and ethical considerations in using Generative AI and LLMs in various
Pre-requisites:
Some basic understanding of either Python or R programming languages and ability to run Python scripts using Jupyter Notebooks.
Pedagogy:
Will be conducted by:
- Presentations
- Videos
Session Plan:
Sessions (1.5 hour) |
Topic & Objective |
Study Material |
TIME |
Day-01 (6.5 hours) Lunch at 12:45-1330 |
|||
1 |
Fundamentals of ML and NLP – I |
Slides, articles and other reading materials |
9:30- 1100 |
2 |
Supervised approaches for NLP: text classification and sentiment analysis – I |
Slides, articles and other reading materials |
11:15- 12:45 |
3 |
Hands-on 1: text classification and sentiment analysis |
Jupyter notebooks and Python scripts |
13:30- 1500 |
4 |
Unsupervised approaches for NLP: topic modeling |
Slides, articles and other reading materials |
1515- 1645 |
(30 Min) |
Reflection |
1645- 1715 |
|
Day-02 (7.5 Hours) Lunch at 1400-1430 |
|||
5 |
Hands-on 2: topic modeling and word vectors |
Jupyter notebooks and Python Scripts |
9-1030 |
6 |
Deep learning Models for NLP: vector representation of words, word vectors/embeddings |
Slides, articles and other reading materials |
1045- 1215 |
7 |
Hands-on 3: word vectors/ word embeddings |
Jupyter notebooks and Python Scripts |
1230- 1400 |
8 |
Introduction to LLMs: transformers architecture and generating text with transformers |
Slides, articles and other reading materials |
1430- 1600 |
9 |
Introduction to prompting and configuring and fine tuning LLMs for specific applications |
Slides, articles and other reading materials |
1615- 1745 |
Day-03 (6 hours) Lunch at 1400-1430 |
|||
10 |
Data analysis using LLMs: text summarization and text classification |
Slides, articles and other reading materials |
9-1030 |
11 |
Hands-on 4: text summarization and text classification using LLMs |
Jupyter notebooks and Python Scripts |
1045- 1215 |
12 |
LLMs use cases, challenges, opportunities, and ethical considerations |
Slides, articles and other reading materials |
1230- 1400 |
13 |
Wrap-up: Discussion and the projects! |
1430- 1600 |
Evaluation Criteria:
Sr.No. |
Component |
Individual / Group |
Weightage |
1 |
Class attendance and active participation |
Individual |
10% |
2 |
Online quizzes (2) |
Individual |
40% |
3 |
Final project |
Individual |
50% |
Total |
100% |
Profile of Instructors:
Raghava Mukkamala:
Raghava Mukkamala is an associate professor at the Department of Digitalization, Copenhagen Business School (CBS), Denmark. Raghava is also the programme director for the Master's Programme in Data Science at CBS and teaches several courses in Deep Learning and Natural Language Processing. His research primarily centered around Data Science, Blockchain Technologies, and Cybersecurity. His current research focuses on developing novel computational methods to analyze social media discourse, misinformation, and hate speech by combining formal/mathematical modeling techniques with advanced machine learning algorithms. As part of a pro-bono research collaboration with the United Nations High Commissioner for Refugees (UNHCR), he works on domain-adaptation and finetuning Large Language Models to identify hate speech and bias against refugees. Even though most of his research is mainly published in IEEE and ACM journals, he has also published several papers in FT-50/AJG-4*/ABDC-A* journals like the Journal of the Association for Information Systems (JAIS) and the Journal of Management Information Systems (JMIS). Raghava holds a Ph.D. in Theoretical Computer Science and an M.Sc. in Information Technology from IT University of Copenhagen, Denmark.
Link to homepage: https://www.cbs.dk/en/staff/rrmdigi
Shivshanker Singh Patel:
Dr. Shivshanker is Chair/Head of the Inter-disciplinary Decision Science & Analytics Lab (IDeAL) at the Indian Institute of Management (IIM) Visakhapatnam. He is currently a faculty in Decision Sciences Department at IIM Visakhapatnam. He has previously worked as a Manager at Mphasis-NextLabs in the data science domain and as an R&D Engineer with Mahindra & Mahindra Automotive. His interests lie in data science, game theory, forecasting, and optimization applied to scarce resource and logistics management. He is an alumnus of the Indian Institute of Science Bangalore, IIT Roorkee, and NIT Raipur, with Degrees of Ph.D., Master and Bachelor respectively
Apply NowFaculty Development Programs (FDP)
Title | Dates | Topics Covered (Indicative) | Mode (online/offline) | ||||||||||||||||||||||||||
Capacity Building Program on Communication Skills (Online) | June 20-24, 2022 |
| Online | ||||||||||||||||||||||||||
Online Workshop on Communication Skills for Musaliar Institute of Management Students | January 24 to February 12, 2022 |
| Online | ||||||||||||||||||||||||||
Open FDP on Advanced Multivariate Data Analytics: Moderation and Mediation Analysis using AMOS & Process Macro | October 18 -22, 2021 |
| Online | ||||||||||||||||||||||||||
Capacity-Building Workshop on Communication Skills | October 4 - 8, 2021 |
| Online | ||||||||||||||||||||||||||
AICTE ATAL Online FDP on Data Analytics for Research and Publication | October 4-8, 2021 |
| Online | ||||||||||||||||||||||||||
Open FDP on Digital and Social Media Marketing | September 20 - 23, 2021 |
| Online | ||||||||||||||||||||||||||
Open FDP on Analytics | August 16-17, 2021 |
| Online | ||||||||||||||||||||||||||
Open FDP on Pedagogy and Research Methodology | July 5-9, 2021 & July 12-16, 2021 |
| Online | ||||||||||||||||||||||||||
AICTE ATAL Online FDP on Data Analytics for Research and Publication | June 14-18, 2021 |
| Online | ||||||||||||||||||||||||||
Open FDP on Handling Partial Least Squares - Structural Equation Modelling (PLS-SEM) | April 19-22 2021 |
| Online | ||||||||||||||||||||||||||
National Institute of Business Management (NIBM), Sri Lanka | January 3-4, 2020 |
| Offline (Colombo, Sri Lanka) | ||||||||||||||||||||||||||
Central Board of Secondary Education (CBSE): Leadership Development for School Principals | January 27 - 31, 2020 |
| Offline | ||||||||||||||||||||||||||
National Project Implementation Unit (TEQIP) |
|
| Online & Offline |