RAG-LLM Evaluation & Test Automation for Beginners
- Descrição
- Currículo
- FAQ
- Revisões
LLMs are everywhere! Every business is building its own custom AI-based RAG-LLMs to improve customer service. But how are engineers testing them? Unlike traditional software testing, AI-based systems need a special methodology for evaluation.
This course starts from the ground up, explaining the architecture of how AI systems (LLMs) work behind the scenes. Then, it dives deep into LLM evaluation metrics.
This course shows you how to effectively use the RAGAS framework library to evaluate LLM metrics through scripted examples. This allows you to use Pytest assertions to check metric benchmark scores and design a robust LLM Test/evaluation automation framework.
What will you learn from the course?
-
High level overview on Large Language Models (LLM)
-
Understand how Custom LLM’s are built using Retrieval Augmented Generation (RAG) Architecture
-
Common Benchmarks/Metrics used in Evaluating RAG based LLM’s
-
Introduction to RAGAS Evaluation framework for evaluating/test LLM’s
-
Practical Scripts generation to automate and assert the Metrics Score of LLM’s.
-
Automate Scenarios such as Single turn interactions and Multi turn interactions with LLM’s using RAGAS Framework
-
Generate Test Data for evaluating the Metrics of LLM using RAGAS Framework.
By end of the course, you will be able to create RAGAS Pytest Evaluation Framework to assert the Metrics of RAG- (Custom) LLM’s
Important Note:
This course covers Top 7 Metrics which are commonly used to Evaluate and test the LLM’s. Same logic can be applied to rest of any other metric evaluations.
Handson Experience:
Course provides Practice RAG -LLM for you for Handson, but at scripting phase, you need a basic subscription of Open AI to access their API’s (Minimal 10$ credit will suffice)
Course Prerequisites:
-
Python, PyTest basics are required to understand the Framework.
We have 2 dedicated sections at the end of this course which gives you necessary knowledge on Python & Pytest required to follow the course.
-
Basic knowledge on API Testing.
-
1What this course offers? FAQ"s -Must WatchVídeo Aula
-
2Course outcome - Setting the stage of expectationTexto
-
3Introduction to Artificial Intelligence and LLM's - How they workVídeo Aula
-
4Overview of popular LLM"s and Challenges with these general LLM'sVídeo Aula
-
5What is Retrieval Augmented Generation (RAG)? Understand its ArchitectureVídeo Aula
-
6End to end flow in RAG Architecture and its key advantagesVídeo Aula
-
10Course resources downloadTexto
-
11Demo of Practice RAG LLM's to evaluate and write test automation scriptsVídeo Aula
-
12Understanding implementation part of practice RAG LLM's to understand contextVídeo Aula
-
13Understand conversational LLM scenarios and how they are applied to RAG ArchVídeo Aula
-
14Understand the Metric benchmarks for Document Retrieval system in LLMVídeo Aula
-
19Making connection with OpenAI using Langchain Framework for RAGASVídeo Aula
-
20End to end -Evaluate LLM for ContextPrecision metric with SingleTurn Test dataVídeo Aula
-
21Metrics document downloadTexto
-
22Communicate with LLM's using API Post call to dynamically get responsesVídeo Aula
-
23Evaluate LLM for Context Recall Metric with RAGAS Pytest Test exampleVídeo Aula
-
27Understand LLM's Faithfulness and Response relevance metrics conceptuallyVídeo Aula
-
28Build LLM Evaluation script to test Faithfulness benchmarks using RAGASVídeo Aula
-
29Reading Test data from external json file to LLM evaluation scriptsVídeo Aula
-
30Understand how Metrics are used at different places of RAG LLM ArchitectureVídeo Aula
-
31Factual Correctness - Build a single Test to evaluate multiple LLM metricsVídeo Aula
-
32Understand EvaluationDataSet and how it help in evaluating Multiple metricsVídeo Aula
-
33Upload the LLM Metrics evaluation results into RAGAS dashboard portal visuallyVídeo Aula
-
34How to evaluate RAG LLM with multi conversational history chatVídeo Aula
-
35Build LLM Evaluation Test which can evaluate multi conversation - exampleVídeo Aula
-
36How to Create Test Data using RAGAS Framework to evaluate LLMVídeo Aula
-
37Load the external docs into Langchain utils to analyze and extract test dataVídeo Aula
-
38Install and configure NLTK package to scan the LLM documents & generating testsVídeo Aula
-
39Generate Rubrics based Criteria Scoring to evaluate the quality of LLM responsesVídeo Aula
-
42Python hello world Program with BasicsVídeo Aula
-
43Datatypes in python and how to get the Type at run timeVídeo Aula
-
44List Datatype and its operations to manipulateVídeo Aula
-
45Tuple and Dictionary Data types in Python with examplesVídeo Aula
-
46If else condition in python with working examplesVídeo Aula
-
47How to Create Dictionaries at run time and add data into itVídeo Aula
-
48How loops work in Python and importance of code idendationVídeo Aula
-
49Programming examples using for loop - 1Vídeo Aula
-
50Programming examples using While loop - 2Vídeo Aula
-
51What are functions? How to use them in PythonVídeo Aula
-
52OOPS Principles : Classes and objects in PythonVídeo Aula
-
53What is Constructor and its role in Object oriented programmingVídeo Aula
-
54Inheritance concepts with examples in PythonVídeo Aula
-
55Strings and its functions in pythonVídeo Aula
