parent
91679dc8e8
commit
23e7549ad1
28 changed files with 223 additions and 29 deletions
@ -1,3 +1,8 @@ |
||||
# Gemini Function Calling |
||||
|
||||
Gemini function calling lets you hook the Gemini language model to real code in a safe and simple way. You first list the functions you want it to use, each with a name, a short note about what it does, and a JSON schema for the needed arguments. When the user speaks, Gemini checks this list and, if a match makes sense, answers with a tiny JSON block that holds the chosen function name and the filled-in arguments. Your program then runs that function, sends the result back, and the chat moves on. Because the reply is strict JSON and not free text, you do not have to guess at what the model means, and you avoid many errors. This flow lets you build agents that pull data, call APIs, or carry out long action chains while keeping control of business logic on your side. |
||||
Gemini function calling lets you hook the Gemini language model to real code in a safe and simple way. You first list the functions you want it to use, each with a name, a short note about what it does, and a JSON schema for the needed arguments. When the user speaks, Gemini checks this list and, if a match makes sense, answers with a tiny JSON block that holds the chosen function name and the filled-in arguments. Your program then runs that function, sends the result back, and the chat moves on. Because the reply is strict JSON and not free text, you do not have to guess at what the model means, and you avoid many errors. This flow lets you build agents that pull data, call APIs, or carry out long action chains while keeping control of business logic on your side. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@Function Calling with the Gemini API](https://ai.google.dev/gemini-api/docs/function-calling) |
||||
- [@article@Understanding Function Calling in Gemini](https://medium.com/google-cloud/understanding-function-calling-in-gemini-3097937f1905) |
@ -1 +1,9 @@ |
||||
# Git and Terminal Usage |
||||
# Git and Terminal Usage |
||||
|
||||
Git and the terminal are key tools for AI agents and developers. Git lets you track changes in code, work with branches, and collaborate safely with others. It stores snapshots of your work so you can undo mistakes or merge ideas. The terminal (command line) lets you move around files, run programs, set up servers, and control tools like Git quickly without a GUI. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@Git Basics](https://git-scm.com/doc) |
||||
- [@official@Introduction to the Terminal](https://ubuntu.com/tutorials/command-line-for-beginners#1-overview) |
||||
- [@video@Git and Terminal Basics Crash Course (YouTube)](https://www.youtube.com/watch?v=HVsySz-h9r4) |
||||
|
@ -1,3 +1,9 @@ |
||||
# Haystack |
||||
|
||||
Haystack is an open-source Python framework that helps you build search and question-answering agents fast. You connect your data sources, pick a language model, and set up pipelines that find the best answer to a user’s query. Haystack handles tasks such as indexing documents, retrieving passages, running the model, and ranking results. It works with many back-ends like Elasticsearch, OpenSearch, FAISS, and Pinecone, so you can scale from a laptop to a cluster. You can add features like summarization, translation, and document chat by dropping extra nodes into the pipeline. The framework also offers REST APIs, a web UI, and clear tutorials, making it easy to test and deploy your agent in production. |
||||
Haystack is an open-source Python framework that helps you build search and question-answering agents fast. You connect your data sources, pick a language model, and set up pipelines that find the best answer to a user’s query. Haystack handles tasks such as indexing documents, retrieving passages, running the model, and ranking results. It works with many back-ends like Elasticsearch, OpenSearch, FAISS, and Pinecone, so you can scale from a laptop to a cluster. You can add features like summarization, translation, and document chat by dropping extra nodes into the pipeline. The framework also offers REST APIs, a web UI, and clear tutorials, making it easy to test and deploy your agent in production. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@Haystack](https://haystack.deepset.ai/) |
||||
- [@official@Haystack Overview](https://docs.haystack.deepset.ai/docs/intro) |
||||
- [@opensource@deepset-ai/haystack](https://github.com/deepset-ai/haystack) |
||||
|
@ -1,3 +1,9 @@ |
||||
# Helicone |
||||
|
||||
Helicone is an open-source tool that helps you watch and understand how your AI agents talk to large language models. You send your model calls through Helicone’s proxy, and it records each request and response without changing the result. A clear web dashboard then shows logs, latency, token counts, error rates, and cost for every call. You can filter, search, and trace a single user journey, which makes it easy to spot slow prompts or rising costs. Helicone also lets you set alerts and share traces with your team, so problems get fixed fast and future changes are safer. |
||||
Helicone is an open-source tool that helps you watch and understand how your AI agents talk to large language models. You send your model calls through Helicone’s proxy, and it records each request and response without changing the result. A clear web dashboard then shows logs, latency, token counts, error rates, and cost for every call. You can filter, search, and trace a single user journey, which makes it easy to spot slow prompts or rising costs. Helicone also lets you set alerts and share traces with your team, so problems get fixed fast and future changes are safer. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@Helicone](https://www.helicone.ai/) |
||||
- [@official@Helicone OSS LLM Observability](https://docs.helicone.ai/getting-started/quick-start) |
||||
- [@opensource@Helicone/helicone](https://github.com/Helicone/helicone) |
||||
|
@ -1,3 +1,10 @@ |
||||
# Human in the Loop Evaluation |
||||
|
||||
Human-in-the-loop evaluation checks an AI agent by letting real people judge its output and behavior. Instead of trusting only automated scores, testers invite users, domain experts, or crowd workers to watch tasks, label answers, flag errors, and rate clarity, fairness, or safety. Their feedback shows problems that numbers alone miss, such as hidden bias, confusing language, or actions that feel wrong to a person. Teams study these notes, adjust the model, and run another round, repeating until the agent meets quality and trust goals. Mixing human judgment with data leads to a system that is more accurate, useful, and safe for everyday use. |
||||
Human-in-the-loop evaluation checks an AI agent by letting real people judge its output and behavior. Instead of trusting only automated scores, testers invite users, domain experts, or crowd workers to watch tasks, label answers, flag errors, and rate clarity, fairness, or safety. Their feedback shows problems that numbers alone miss, such as hidden bias, confusing language, or actions that feel wrong to a person. Teams study these notes, adjust the model, and run another round, repeating until the agent meets quality and trust goals. Mixing human judgment with data leads to a system that is more accurate, useful, and safe for everyday use. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@article@Human in the Loop · Cloudflare Agents](https://developers.cloudflare.com/agents/concepts/human-in-the-loop/) |
||||
- [@article@What is Human-in-the-Loop: A Guide](https://logifusion.com/what-is-human-in-the-loop-htil/) |
||||
- [@article@Human-in-the-Loop ML](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-human-review-workflow.html) |
||||
- [@article@The Importance of Human Feedback in AI (Hugging Face Blog)](https://huggingface.co/blog/rlhf) |
||||
|
@ -1,3 +1,9 @@ |
||||
# Integration Testing for Flows |
||||
|
||||
Integration testing for flows checks that an AI agent works well from the first user input to the final action, across every step in between. It joins all parts of the system—natural-language understanding, planning, memory, tools, and output—and runs them together in real scenarios. Test cases follow common and edge-case paths a user might take. The goal is to catch errors that only appear when parts interact, such as wrong data passed between modules or timing issues. Good practice includes building automated test suites, using real or mock services, and logging each step for easy debugging. When integration tests pass, you gain confidence that the whole flow feels smooth and reliable for users. |
||||
Integration testing for flows checks that an AI agent works well from the first user input to the final action, across every step in between. It joins all parts of the system—natural-language understanding, planning, memory, tools, and output—and runs them together in real scenarios. Test cases follow common and edge-case paths a user might take. The goal is to catch errors that only appear when parts interact, such as wrong data passed between modules or timing issues. Good practice includes building automated test suites, using real or mock services, and logging each step for easy debugging. When integration tests pass, you gain confidence that the whole flow feels smooth and reliable for users. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@article@Integration Testing for AI-based Features with Humans](https://www.microsoft.com/en-us/research/publication/hint-integration-testing-for-ai-based-features-with-humans-in-the-loop/) |
||||
- [@article@Integration Testing and Unit Testing in AI](https://www.aviator.co/blog/integration-testing-and-unit-testing-in-the-age-of-ai/) |
||||
- [@article@Integration Testing](https://www.guru99.com/integration-testing.html) |
@ -1,3 +1,9 @@ |
||||
# Iterate and Test your Prompts |
||||
|
||||
After you write a first prompt, treat it as a draft, not the final version. Run it with the AI, check the output, and note what is missing, wrong, or confusing. Change one thing at a time, such as adding an example, a limit on length, or a tone request. Test again and see if the result gets closer to what you want. Keep a record of each change and its effect, so you can learn patterns that work. Stop when the output is clear, correct, and repeatable. This loop of try, observe, adjust, and retry turns a rough prompt into a strong one. |
||||
After you write a first prompt, treat it as a draft, not the final version. Run it with the AI, check the output, and note what is missing, wrong, or confusing. Change one thing at a time, such as adding an example, a limit on length, or a tone request. Test again and see if the result gets closer to what you want. Keep a record of each change and its effect, so you can learn patterns that work. Stop when the output is clear, correct, and repeatable. This loop of try, observe, adjust, and retry turns a rough prompt into a strong one. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@article@Master Iterative Prompting: A Guide](https://blogs.vreamer.space/master-iterative-prompting-a-guide-to-more-effective-interactions-with-ai-50a736eaec38) |
||||
- [@course@Prompt Engineering Best Practices](https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/) |
||||
- [@video@Prompt Engineering: The Iterative Process](https://www.youtube.com/watch?v=dOxUroR57xs) |
||||
|
@ -1,3 +1,11 @@ |
||||
# Langchain |
||||
# LangChain |
||||
|
||||
LangChain is a Python and JavaScript library that helps you put large language models to work in real products. It gives ready-made parts for common agent tasks such as talking to many tools, keeping short-term memory, and calling an external API when the model needs fresh data. You combine these parts like Lego blocks: pick a model, add a prompt template, chain the steps, then wrap the chain in an “agent” that can choose what step to run next. Built-in connectors link to OpenAI, Hugging Face, vector stores, and SQL databases, so you can search documents or pull company data without writing a lot of glue code. This lets you move fast from idea to working bot, while still letting you swap out parts if your needs change. |
||||
LangChain is a Python and JavaScript library that helps you put large language models to work in real products. It gives ready-made parts for common agent tasks such as talking to many tools, keeping short-term memory, and calling an external API when the model needs fresh data. You combine these parts like Lego blocks: pick a model, add a prompt template, chain the steps, then wrap the chain in an “agent” that can choose what step to run next. Built-in connectors link to OpenAI, Hugging Face, vector stores, and SQL databases, so you can search documents or pull company data without writing a lot of glue code. This lets you move fast from idea to working bot, while still letting you swap out parts if your needs change. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@LangChain Documentation](https://python.langchain.com/docs/introduction/) |
||||
- [@opensource@langchain-ai/langchain](https://github.com/langchain-ai/langchain) |
||||
- [@article@Building Applications with LLMs using LangChain](https://www.pinecone.io/learn/series/langchain/) |
||||
- [@article@AI Agents with LangChain and LangGraph](https://www.udacity.com/course/ai-agents-with-langchain-and-langgraph--cd13764) |
||||
- [@video@LangChain Crash Course - Build LLM Apps Fast (YouTube)](https://www.youtube.com/watch?v=nAmC7SoVLd8) |
||||
|
@ -1,3 +1,10 @@ |
||||
# LangFuse |
||||
|
||||
LangFuse is a free, open-source tool that lets you watch and debug AI agents while they run. You add a small code snippet to your agent, and LangFuse starts collecting every prompt, model response, and user input. It shows this data as neat timelines, so you can see each step the agent takes, how long the calls cost, and where errors happen. You can tag runs, search through them, and compare different prompt versions to find what works best. The dashboard also tracks token usage and latency, helping you cut cost and improve speed. Because LangFuse stores data in your own database, you keep full control of sensitive text. It works well with popular frameworks like LangChain and can send alerts to Slack or email when something breaks. |
||||
LangFuse is a free, open-source tool that lets you watch and debug AI agents while they run. You add a small code snippet to your agent, and LangFuse starts collecting every prompt, model response, and user input. It shows this data as neat timelines, so you can see each step the agent takes, how long the calls cost, and where errors happen. You can tag runs, search through them, and compare different prompt versions to find what works best. The dashboard also tracks token usage and latency, helping you cut cost and improve speed. Because LangFuse stores data in your own database, you keep full control of sensitive text. It works well with popular frameworks like LangChain and can send alerts to Slack or email when something breaks. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@LangFuse](https://langfuse.com/) |
||||
- [@official@LangFuse Documentation](https://langfuse.com/docs) |
||||
- [@opensource@langfuse/langfuse](https://github.com/langfuse/langfuse) |
||||
- [@article@Langfuse: Open Source LLM Engineering Platform](https://www.ycombinator.com/companies/langfuse) |
||||
|
@ -1,3 +1,10 @@ |
||||
# LangSmith |
||||
|
||||
LangSmith is a web tool that helps you see and fix what your AI agents are doing. It records each call that the agent makes to a language model, the input it used, and the answer it got back. You can replay any step, compare different prompts, measure cost, speed, and error rates, and tag runs for easy search. It also lets you store test sets and run quick checks so you know if new code makes the agent worse. By showing clear traces and charts, LangSmith makes it easier to debug, improve, and trust AI systems built with LangChain or other frameworks. |
||||
LangSmith is a web tool that helps you see and fix what your AI agents are doing. It records each call that the agent makes to a language model, the input it used, and the answer it got back. You can replay any step, compare different prompts, measure cost, speed, and error rates, and tag runs for easy search. It also lets you store test sets and run quick checks so you know if new code makes the agent worse. By showing clear traces and charts, LangSmith makes it easier to debug, improve, and trust AI systems built with LangChain or other frameworks. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@LangSmith](https://smith.langchain.com/) |
||||
- [@official@LangSmith Documentation](https://docs.smith.langchain.com/) |
||||
- [@official@Harden your application with LangSmith Evaluation](https://www.langchain.com/evaluation) |
||||
- [@article@What is LangSmith and Why should I care as a developer?](https://medium.com/around-the-prompt/what-is-langsmith-and-why-should-i-care-as-a-developer-e5921deb54b5) |
||||
|
@ -1,3 +1,10 @@ |
||||
# LangSmith |
||||
|
||||
LangSmith is a tool that helps you see how well your AI agents work. It lets you record every step the agent takes, from the first input to the final answer. You can replay these steps to find places where the agent goes wrong. LangSmith also lets you create test sets with real user prompts and compare new model versions against them. It shows clear numbers on speed, cost, and accuracy so you can spot trade-offs. Because LangSmith links to LangChain, you can add it with only a few extra lines of code. The web dashboard then gives charts, error logs, and side-by-side result views. This makes it easy to track progress, fix bugs, and prove that your agent is getting better over time. |
||||
LangSmith is a tool that helps you see how well your AI agents work. It lets you record every step the agent takes, from the first input to the final answer. You can replay these steps to find places where the agent goes wrong. LangSmith also lets you create test sets with real user prompts and compare new model versions against them. It shows clear numbers on speed, cost, and accuracy so you can spot trade-offs. Because LangSmith links to LangChain, you can add it with only a few extra lines of code. The web dashboard then gives charts, error logs, and side-by-side result views. This makes it easy to track progress, fix bugs, and prove that your agent is getting better over time. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@LangSmith](https://smith.langchain.com/) |
||||
- [@official@LangSmith Documentation](https://docs.smith.langchain.com/) |
||||
- [@official@Harden your application with LangSmith Evaluation](https://www.langchain.com/evaluation) |
||||
- [@article@What is LangSmith and Why should I care as a developer?](https://medium.com/around-the-prompt/what-is-langsmith-and-why-should-i-care-as-a-developer-e5921deb54b5) |
||||
|
@ -1,3 +1,12 @@ |
||||
# LlamaIndex |
||||
|
||||
LlamaIndex is an open-source Python toolkit that helps you give a language model access to your own data. You load files such as PDFs, web pages, or database rows. The toolkit breaks the text into chunks, turns them into vectors, and stores them in a chosen vector store like FAISS or Pinecone. When a user asks a question, LlamaIndex finds the best chunks, adds them to the prompt, and sends the prompt to the model. This flow is called retrieval-augmented generation and it lets an agent give answers grounded in your content. The library offers simple classes for loading, indexing, querying, and composing tools, so you write less boilerplate code. It also works with other frameworks, including LangChain, and supports models from OpenAI or Hugging Face. With a few lines of code you can build a chatbot, Q&A system, or other agent that knows your documents. |
||||
LlamaIndex is an open-source Python toolkit that helps you give a language model access to your own data. You load files such as PDFs, web pages, or database rows. The toolkit breaks the text into chunks, turns them into vectors, and stores them in a chosen vector store like FAISS or Pinecone. When a user asks a question, LlamaIndex finds the best chunks, adds them to the prompt, and sends the prompt to the model. This flow is called retrieval-augmented generation and it lets an agent give answers grounded in your content. The library offers simple classes for loading, indexing, querying, and composing tools, so you write less boilerplate code. It also works with other frameworks, including LangChain, and supports models from OpenAI or Hugging Face. With a few lines of code you can build a chatbot, Q&A system, or other agent that knows your documents. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@LlamaIndex](https://llamaindex.ai/) |
||||
- [@official@LlamaIndex Documentation](https://docs.smith.langchain.com/) |
||||
- [@official@What is LlamaIndex.TS](https://ts.llamaindex.ai/docs/llamaindex) |
||||
- [@opensource@run-llama/llama_index](https://github.com/run-llama/llama_index) |
||||
- [@article@What is LlamaIndex? - IBM](https://www.ibm.com/think/topics/llamaindex) |
||||
- [@article@LlamaIndex - Hugging Face](https://huggingface.co/llamaindex) |
@ -1,3 +1,9 @@ |
||||
# LLM Native "Function Calling" |
||||
|
||||
LLM native “function calling” lets a large language model decide when to run a piece of code and which inputs to pass to it. You first tell the model what functions are available. For each one you give a short name, a short description, and a list of arguments with their types. During a chat, the model can answer in JSON that matches this schema instead of plain text. Your wrapper program reads the JSON, calls the real function, and then feeds the result back to the model so it can keep going. This loop helps an agent search the web, look up data, send an email, or do any other task you expose. Because the output is structured, you get fewer mistakes than when the model tries to write raw code or natural-language commands. You also keep tight control over what the agent can and cannot do. Most current API providers support this method, so you can add new tools by only editing the schema and a handler, not the model itself. |
||||
LLM native “function calling” lets a large language model decide when to run a piece of code and which inputs to pass to it. You first tell the model what functions are available. For each one you give a short name, a short description, and a list of arguments with their types. During a chat, the model can answer in JSON that matches this schema instead of plain text. Your wrapper program reads the JSON, calls the real function, and then feeds the result back to the model so it can keep going. This loop helps an agent search the web, look up data, send an email, or do any other task you expose. Because the output is structured, you get fewer mistakes than when the model tries to write raw code or natural-language commands. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@article@A Comprehensive Guide to Function Calling in LLMs](https://thenewstack.io/a-comprehensive-guide-to-function-calling-in-llms/) |
||||
- [@article@Function Calling with LLMs | Prompt Engineering Guide](https://www.promptingguide.ai/applications/function_calling) |
||||
- [@article@Function Calling with Open-Source LLMs](https://medium.com/@rushing_andrei/function-calling-with-open-source-llms-594aa5b3a304) |
@ -1,3 +1,10 @@ |
||||
# Local Desktop |
||||
|
||||
A Local Desktop deployment means you run the MCP server on your own computer instead of on a remote machine or cloud service. You install the MCP software, any language runtimes it needs, and the model files all on your desktop or laptop. When you start the server, it listens on a port such as 127.0.0.1:8000, which is only reachable from the same computer unless you change network settings. This setup is handy for quick tests, small demos, or private work because you control the files and can restart the server at any time. It also avoids extra cost from cloud hosting. The main limits are the power of your hardware and the fact that other people cannot reach the service unless you expose it through port forwarding or a tunnel. |
||||
A Local Desktop deployment means running the MCP server directly on your own computer instead of a remote cloud or server. You install the MCP software, needed runtimes, and model files onto your desktop or laptop. The server then listens on a local address like `127.0.0.1:8000`, accessible only from the same machine unless you open ports manually. This setup is great for fast tests, personal demos, or private experiments since you keep full control and avoid cloud costs. However, it's limited by your hardware's speed and memory, and others cannot access it without tunneling tools like ngrok or local port forwarding. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@article@Build a Simple Local MCP Server](https://blog.stackademic.com/build-simple-local-mcp-server-5434d19572a4) |
||||
- [@article@How to Build and Host Your Own MCP Servers in Easy Steps](https://collabnix.com/how-to-build-and-host-your-own-mcp-servers-in-easy-steps/) |
||||
- [@article@Expose localhost to Internet](https://ngrok.com/docs) |
||||
- [@video@Run a Local Server on Your Machine](https://www.youtube.com/watch?v=ldGl6L4Vktk) |
||||
|
@ -1,3 +1,11 @@ |
||||
# Long Term Memory |
||||
|
||||
Long term memory in an AI agent is the part of its storage where information is kept for long periods so it can be used again in the future. It works like a notebook that the agent can write to and read from whenever needed. The agent saves facts, past events, user preferences, and learned skills in this space. When a similar event happens later, the agent looks up this stored data to make better choices and respond in a consistent way. Long term memory lets the agent grow smarter over time because it does not forget important details after the current task ends. This memory usually lives in a database or file system and may include text, numbers, or compressed states of past conversations. |
||||
Long term memory in an AI agent stores important information for future use, like a digital notebook. It saves facts, past events, user preferences, and learned skills so the agent can make smarter and more consistent decisions over time. Unlike short-term memory, this data survives across sessions. When a similar situation comes up, the agent can look back and use what it already knows. Long term memory usually lives in a database, file system, or vector store and may hold text, numbers, embeddings, or past conversation states. Good management of long-term memory is key for building agents that feel personalized and get better with experience. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@article@Long Term Memory in AI Agents](https://medium.com/@alozie_igbokwe/ai-101-long-term-memory-in-ai-agents-35f87f2d0ce0) |
||||
- [@article@Memory Management in AI Agents](https://python.langchain.com/docs/how_to/chatbots_memory/) |
||||
- [@article@Storing and Retrieving Knowledge for Agents](https://www.pinecone.io/learn/langchain-retrieval-augmentation/) |
||||
- [@article@Short-Term vs Long-Term Memory in AI Agents](https://adasci.org/short-term-vs-long-term-memory-in-ai-agents/) |
||||
- [@video@Building Brain-Like Memory for AI Agents](https://www.youtube.com/watch?v=VKPngyO0iKg) |
||||
|
@ -1,3 +1,10 @@ |
||||
# Manual (from scratch) |
||||
|
||||
Building an AI agent from scratch means you write every part of the system yourself instead of using ready-made tools. You decide how the agent senses the world, saves data, learns, and makes choices. First, you choose a goal, like playing a game or answering questions. Then you design the inputs, for example keyboard moves or text. You code the logic that turns these inputs into actions. You may add a learning part, such as a basic neural network or a set of rules that update over time. You also build memory so the agent can use past facts. Testing is key: run the agent, watch what it does, and fix mistakes. This path is slow and hard, but it teaches you how each piece works and gives you full control. |
||||
Building an AI agent from scratch means writing every part of the system yourself, without ready-made libraries. You define how the agent senses inputs, stores memory, makes decisions, and learns over time. First, you pick a clear goal, like solving puzzles or chatting. Then you code the inputs (keyboard, mouse, text), decision logic (rules or neural networks), and memory (saving facts from past events). Testing is critical: you run the agent, watch its actions, debug, and improve. Though it takes longer, this approach gives deep understanding and full control over how the agent works and evolves. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@article@A Step-by-Step Guide to Building an AI Agent From Scratch](https://www.neurond.com/blog/how-to-build-an-ai-agent) |
||||
- [@article@How to Build AI Agents](https://wotnot.io/blog/build-ai-agents) |
||||
- [@article@Build Your Own AI Agent from Scratch in 30 Minutes](https://medium.com/@gurpartap.sandhu3/build-you-own-ai-agent-from-scratch-in-30-mins-using-simple-python-1458f8099da0) |
||||
- [@video@Building an AI Agent From Scratch](https://www.youtube.com/watch?v=bTMPwUgLZf0) |
||||
|
@ -1,3 +1,12 @@ |
||||
# Max Length |
||||
|
||||
Max Length is the setting that tells a language model the biggest number of tokens it may write in one go. A token is a small piece of text, usually a short word or part of a word, so 100 tokens roughly equals a short paragraph. When the model reaches the limit, it stops and returns the answer. A small limit keeps replies short, saves money, and runs fast, but it can cut ideas in half. A large limit lets the model finish long thoughts, yet it needs more time, more processing power, and can wander off topic. Choose the value to fit the job: a tweet might need 50 tokens, a long guide might need 1,000 or more. Good tuning finds a balance between cost, speed, and clear, complete answers. |
||||
Max Length sets the maximum number of tokens a language model can generate in one reply. Tokens are pieces of text—roughly 100 tokens equals a short paragraph. A small limit saves time and cost but risks cutting answers short. A large limit allows full, detailed replies but needs more compute and can lose focus. Choose limits based on the task: short limits for tweets, longer ones for articles. Tuning Max Length carefully helps balance clarity, speed, and cost. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@OpenAI Token Usage](https://platform.openai.com/docs/guides/gpt/managing-tokens) |
||||
- [@official@Size and Max Token Limits](https://docs.anthropic.com/claude/docs/size-and-token-limits) |
||||
- [@article@Utilising Max Token Context Window of Anthropic Claude](https://medium.com/@nampreetsingh/utilising-max-token-context-window-of-anthropic-claude-on-amazon-bedrock-7377d94b2dfa) |
||||
- [@article@Controlling the Length of OpenAI Model Responses](https://help.openai.com/en/articles/5072518-controlling-the-length-of-openai-model-responses) |
||||
- [@article@Max Model Length in AI](https://www.restack.io/p/ai-model-answer-max-model-length-cat-ai) |
||||
- [@video@Understanding ChatGPT/OpenAI Tokens](https://youtu.be/Mo3NV5n1yZk) |
@ -1,3 +1,10 @@ |
||||
# MCP Client |
||||
|
||||
The MCP Client is the part of an AI agent that talks directly to the large-language-model service. It gathers all messages, files, and tool signals that make up the current working state, packs them into the format defined by the Model Context Protocol, and sends the bundle to the model’s API. After the model answers, the client unpacks the reply, checks that it follows protocol rules, and hands the result to other modules, such as planners or tool runners. It also tracks tokens, applies privacy filters, retries on network errors, and logs key events for debugging. In short, the MCP Client is the gateway that turns local agent data into a valid model request and turns the model’s response into something the rest of the system can use. |
||||
The MCP Client is the part of an AI agent that talks to the language model API. It collects messages, files, and tool signals, packs them using the Model Context Protocol, and sends them to the model. When a reply comes back, it unpacks it, checks the format, and passes the result to other modules. It also tracks token usage, filters private data, retries failed calls, and logs important events for debugging. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@opensource@Model Context Protocol](https://github.com/modelcontextprotocol/modelcontextprotocol) |
||||
- [@official@Model Context Protocol](https://modelcontextprotocol.io/introduction) |
||||
- [@official@OpenAI API Reference](https://platform.openai.com/docs/api-reference) |
||||
- [@official@Anthropic API Documentation](https://docs.anthropic.com/claude/reference) |
||||
|
@ -1,3 +1,10 @@ |
||||
# MCP Hosts |
||||
|
||||
MCP Hosts are the computers or cloud services that run the Model Context Protocol. They keep the protocol code alive, listen for incoming calls, and pass data between users, tools, and language models. A host loads the MCP manifest, checks that requests follow the rules, and stores any state that needs to last between calls. It may cache recent messages, track token use, and add safety or billing checks before it forwards a prompt to the model. Hosts also expose an API endpoint so that outside apps can connect without knowing the low-level details of the protocol. You can run a host on your own laptop for testing or deploy it on a serverless platform for scale; either way, it provides the same trusted place where MCP agents, tools, and data meet. |
||||
MCP Hosts are computers or services that run the Model Context Protocol. They handle incoming calls, load the MCP manifest, check requests, and pass data between users, tools, and language models. Hosts may cache recent messages, track token usage, and add safety or billing checks before sending prompts to the model. They expose an API endpoint so apps can connect easily. You can run a host on your laptop for testing or deploy it on cloud platforms for scale. The host acts as the trusted bridge where agents, tools, and data meet. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@Vercel Serverless Hosting](https://vercel.com/docs) |
||||
- [@article@The Ultimate Guide to MCP](https://guangzhengli.com/blog/en/model-context-protocol) |
||||
- [@article@AWS MCP Servers for Code Assistants](https://aws.amazon.com/blogs/machine-learning/introducing-aws-mcp-servers-for-code-assistants-part-1/) |
||||
- [@opensource@punkeye/awesome-mcp-servers](https://github.com/punkpeye/awesome-mcp-servers) |
@ -1,3 +1,10 @@ |
||||
# MCP Servers |
||||
|
||||
An MCP Server is the main machine or cloud service that runs the Model Context Protocol. It keeps the shared “memory” that different AI agents need so they stay on the same page. When an agent sends a request, the server checks who is asking, pulls the right context from its store, and sends it back fast. It also saves new facts and task results so the next agent can use them. An MCP Server must handle many users at once, protect private data with strict access rules, and log every change for easy roll-back. Good servers break work into small tasks, spread them across many computers, and add backups so they never lose data. In short, the MCP Server is the hub that makes sure all agents share fresh, safe, and correct context. |
||||
An MCP Server is the main machine or cloud service that runs the Model Context Protocol. It keeps the shared “memory” that different AI agents need so they stay on the same page. When an agent sends a request, the server checks who is asking, pulls the right context from its store, and sends it back fast. It also saves new facts and task results so the next agent can use them. An MCP Server must handle many users at once, protect private data with strict access rules, and log every change for easy roll-back. Good servers break work into small tasks, spread them across many computers, and add backups so they never lose data. In short, the MCP Server is the hub that makes sure all agents share fresh, safe, and correct context. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@article@Introducing the Azure MCP Server ](https://devblogs.microsoft.com/azure-sdk/introducing-the-azure-mcp-server/) |
||||
- [@article@The Ultimate Guide to MCP](https://guangzhengli.com/blog/en/model-context-protocol) |
||||
- [@article@AWS MCP Servers for Code Assistants](https://aws.amazon.com/blogs/machine-learning/introducing-aws-mcp-servers-for-code-assistants-part-1/) |
||||
- [@opensource@punkeye/awesome-mcp-servers](https://github.com/punkpeye/awesome-mcp-servers) |
@ -1,3 +1,10 @@ |
||||
# Metrics to Track |
||||
|
||||
To know if an AI agent works well, you need numbers that tell the story. Track accuracy, precision, recall, and F1 score to see how often the agent is right. For ranking tasks, record mean average precision or ROC-AUC. If users interact with the agent, measure response time, latency, and the share of failed requests. Safety metrics count toxic or biased outputs, while robustness tests see how the agent copes with noisy or tricky inputs. Resource metrics—memory, CPU, and energy—show if the system can run at scale. Choose the metrics that fit the task, compare results to a baseline, and watch the trend with every new version. |
||||
To judge how well an AI agent works, you need clear numbers. Track accuracy, precision, recall, and F1 score to measure correctness. For ranking tasks, use metrics like mean average precision or ROC-AUC. If users interact with the agent, monitor response time, latency, and failure rates. Safety metrics count toxic or biased outputs, while robustness tests check how the agent handles messy or tricky inputs. Resource metrics—memory, CPU, and energy—show if it can scale. Pick the metrics that match your goal, compare against a baseline, and track trends across versions. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@article@Robustness Testing for AI](https://mitibmwatsonailab.mit.edu/category/robustness/) |
||||
- [@article@Complete Guide to Machine Learning Evaluation Metrics](https://medium.com/analytics-vidhya/complete-guide-to-machine-learning-evaluation-metrics-615c2864d916) |
||||
- [@article@Measuring Model Performance](https://developers.google.com/machine-learning/crash-course/classification/accuracy) |
||||
- [@article@A Practical Framework for (Gen)AI Value Measurement](https://medium.com/google-cloud/a-practical-framework-for-gen-ai-value-measurement-5fccf3b66c43) |
||||
|
@ -1,3 +1,10 @@ |
||||
# Model Context Protocol (MCP) |
||||
|
||||
Model Context Protocol (MCP) is a rulebook that tells an AI agent how to pack background information before it sends a prompt to a language model. It lists what pieces go into the prompt—things like the system role, the user’s request, past memory, tool calls, or code snippets—and fixes their order. Clear tags mark each piece, so both humans and machines can see where one part ends and the next begins. Keeping the format steady cuts confusion, lets different tools work together, and makes it easier to test or swap models later. When agents follow MCP, the model gets a clean, complete prompt and can give better answers. |
||||
Model Context Protocol (MCP) is a rulebook that tells an AI agent how to pack background information before it sends a prompt to a language model. It lists what pieces go into the prompt—things like the system role, the user’s request, past memory, tool calls, or code snippets—and fixes their order. Clear tags mark each piece, so both humans and machines can see where one part ends and the next begins. Keeping the format steady cuts confusion, lets different tools work together, and makes it easier to test or swap models later. When agents follow MCP, the model gets a clean, complete prompt and can give better answers. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@opensource@Model Context Protocol](https://github.com/modelcontextprotocol/modelcontextprotocol) |
||||
- [@official@Model Context Protocol](https://modelcontextprotocol.io/introduction) |
||||
- [@article@Introducing the Azure MCP Server ](https://devblogs.microsoft.com/azure-sdk/introducing-the-azure-mcp-server/) |
||||
- [@article@The Ultimate Guide to MCP](https://guangzhengli.com/blog/en/model-context-protocol) |
@ -1,3 +1,8 @@ |
||||
# NPC / Game AI |
||||
|
||||
Game studios often use AI agents to control non-player characters (NPCs). The agent watches the game state and picks actions such as moving, speaking, or fighting. It can switch tactics when the player changes strategy, so battles feel fresh instead of scripted. A quest giver can also use an agent to offer hints that fit the player’s progress. In open-world games, agents help crowds walk around objects, pick new goals, and react to danger, which makes towns feel alive. Designers save time because they write broad rules and let the agent fill in details instead of hand-coding every scene. Better NPC behavior keeps players engaged and raises replay value. |
||||
Game studios use AI agents to control non-player characters (NPCs). The agent observes the game state and decides actions like moving, speaking, or fighting. It can shift tactics when the player changes strategy, keeping battles fresh instead of predictable. A quest giver might use an agent to offer hints that fit the player’s progress. In open-world games, agents guide crowds to move around obstacles, set new goals, and react to threats, making towns feel alive. Designers save time by writing broad rules and letting agents fill in details instead of hand-coding every scene. Smarter NPC behavior keeps players engaged and boosts replay value. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@Unity – AI for NPCs](https://dev.epicgames.com/documentation/en-us/unreal-engine/artificial-intelligence-in-unreal-engine?application_version=5.3) |
||||
- [@article@AI-Driven NPCs: The Future of Gaming Explained](https://www.capermint.com/blog/everything-you-need-to-know-about-non-player-character-npc/) |
@ -1,3 +1,8 @@ |
||||
# Observation & Reflection |
||||
|
||||
Observation and reflection form the thinking pause in an AI agent’s loop. First, the agent looks at the world around it, gathers fresh data, and sees what has changed. It then pauses to ask, “What does this new information mean for my goal?” During this short check, the agent updates its memory, spots errors, and ranks what matters most. These steps guide wiser plans and actions in the next cycle. Without careful observation and reflection, the agent would rely on old or wrong facts and soon drift off course. |
||||
Observation and reflection form the thinking pause in an AI agent’s loop. First, the agent looks at the world around it, gathers fresh data, and sees what has changed. It then pauses to ask, “What does this new information mean for my goal?” During this short check, the agent updates its memory, spots errors, and ranks what matters most. These steps guide wiser plans and actions in the next cycle. Without careful observation and reflection, the agent would rely on old or wrong facts and soon drift off course. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@Best Practices for Prompting and Self-checking](https://platform.openai.com/docs/guides/prompt-engineering) |
||||
- [@article@Self-Reflective AI: Building Agents That Learn by Observing Themselves](https://arxiv.org/abs/2302.14045) |
||||
|
@ -1,3 +1,11 @@ |
||||
# Open Weight Models |
||||
|
||||
Open-weight models are neural networks whose trained parameters, also called weights, are shared with everyone. Anyone can download the files, run the model, fine-tune it, or build tools on top of it. The licence that comes with the model spells out what you are allowed to do. Some licences are very permissive and even let you use the model for commercial work. Others allow only research or personal projects. Because the weights are public, the community can inspect how the model works, check for bias, and suggest fixes. Open weights also lower costs, since teams do not have to train a large model from scratch. Well-known examples include BLOOM, Falcon, and Llama 2. |
||||
Open-weight models are neural networks whose trained parameters, also called weights, are shared with everyone. Anyone can download the files, run the model, fine-tune it, or build tools on top of it. The licence that comes with the model spells out what you are allowed to do. Some licences are very permissive and even let you use the model for commercial work. Others allow only research or personal projects. Because the weights are public, the community can inspect how the model works, check for bias, and suggest fixes. Open weights also lower costs, since teams do not have to train a large model from scratch. Well-known examples include BLOOM, Falcon, and Llama 2. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@BLOOM BigScience](https://bigscience.huggingface.co/) |
||||
- [@official@Falcon LLM – Technology Innovation Institute (TII)](https://falconllm.tii.ae/) |
||||
- [@official@Llama 2 – Meta's Official Announcement](https://ai.meta.com/llama/) |
||||
- [@official@Hugging Face – Open LLM Leaderboard (Top Open Models)](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) |
||||
- [@official@EleutherAI – Open Research Collective (GPT-Neo, GPT-J, etc.)](https://www.eleuther.ai/) |
||||
|
@ -1,3 +1,10 @@ |
||||
# OpenAI Assistant API |
||||
|
||||
The OpenAI Assistants API lets you add clear, task-specific actions to a chat with a large language model. You first describe each action you want the model to use, giving it a name, a short purpose, and a list of inputs in JSON form. During the chat, the model may decide that one of these actions will help. It then returns the name of the action and a JSON object with the input values it thinks are right. Your code receives this call, runs real work such as a database query or a web request, and sends the result back to the model. The model reads the result and continues the chat, now armed with fresh facts. This loop lets you keep control of what real work happens while still letting the model plan and talk in natural language. |
||||
The OpenAI Assistants API lets you add clear, task-specific actions to a chat with a large language model. You first describe each action you want the model to use, giving it a name, a short purpose, and a list of inputs in JSON form. During the chat, the model may decide that one of these actions will help. It then returns the name of the action and a JSON object with the input values it thinks are right. Your code receives this call, runs real work such as a database query or a web request, and sends the result back to the model. The model reads the result and continues the chat, now armed with fresh facts. This loop lets you keep control of what real work happens while still letting the model plan and talk in natural language. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@OpenAI Documentation – Assistants API Overview](https://platform.openai.com/docs/assistants/overview) |
||||
- [@official@OpenAI Blog – Introducing the Assistants API](https://openai.com/blog/assistants-api) |
||||
- [@official@OpenAI Cookbook – Assistants API Example](https://github.com/openai/openai-cookbook/blob/main/examples/Assistants_API_overview_python.ipynb) |
||||
- [@official@OpenAI API Reference – Assistants Endpoints](https://platform.openai.com/docs/api-reference/assistants) |
||||
|
@ -1,3 +1,11 @@ |
||||
# OpenAI Functions Calling |
||||
|
||||
OpenAI Function Calling lets you give a language model a list of tools and have it decide which one to use and with what data. You describe each tool with a short name, what it does, and the shape of its inputs in a small JSON-like schema. You then pass the user message and this tool list to the model. Instead of normal text, the model can reply with a JSON block that names the tool and fills in the needed arguments. Your program reads this block, runs the real function, and can send the result back for the next step. This pattern makes agent actions clear, easy to parse, and hard to abuse, because the model cannot run code on its own and all calls go through your checks. It also cuts down on prompt hacks and wrong formats, so agents work faster and more safely. |
||||
OpenAI Function Calling lets you give a language model a list of tools and have it decide which one to use and with what data. You describe each tool with a short name, what it does, and the shape of its inputs in a small JSON-like schema. You then pass the user message and this tool list to the model. Instead of normal text, the model can reply with a JSON block that names the tool and fills in the needed arguments. Your program reads this block, runs the real function, and can send the result back for the next step. This pattern makes agent actions clear, easy to parse, and hard to abuse, because the model cannot run code on its own and all calls go through your checks. It also cuts down on prompt hacks and wrong formats, so agents work faster and more safely. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@OpenAI Documentation – Function Calling](https://platform.openai.com/docs/guides/function-calling) |
||||
- [@official@OpenAI Cookbook – Using Functions with GPT Models](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_call_functions_with_chat_models.ipynb) |
||||
- [@officialOpenAI Blog – Announcing Function Calling and Other Updates](https://openai.com/blog/function-calling-and-other-api-updates) |
||||
- [@officialOpenAI API Reference – Functions Section](https://platform.openai.com/docs/api-reference/chat/create#functions) |
||||
- [@officialOpenAI Community – Discussions and Examples on Function Calling](https://community.openai.com/tag/function-calling) |
||||
|
@ -1,3 +1,10 @@ |
||||
# openllmetry |
||||
|
||||
openllmetry is a small Python library that makes it easy to watch what your AI agent is doing and how well it is working. It wraps calls to large-language-model APIs, vector stores, and other tools, then sends logs, traces, and simple metrics to any backend that speaks the OpenTelemetry standard, such as Jaeger, Zipkin, or Grafana. You add one or two lines of code at start-up, and the library captures prompt text, model name, latency, token counts, and costs each time the agent asks the model for an answer. The data helps you spot slow steps, high spend, or bad answers, and it lets you play back full traces to debug agent chains. Because it follows OpenTelemetry, you can mix these AI traces with normal service traces and see the whole flow in one place. |
||||
openllmetry is a small Python library that makes it easy to watch what your AI agent is doing and how well it is working. It wraps calls to large-language-model APIs, vector stores, and other tools, then sends logs, traces, and simple metrics to any backend that speaks the OpenTelemetry standard, such as Jaeger, Zipkin, or Grafana. You add one or two lines of code at start-up, and the library captures prompt text, model name, latency, token counts, and costs each time the agent asks the model for an answer. The data helps you spot slow steps, high spend, or bad answers, and it lets you play back full traces to debug agent chains. Because it follows OpenTelemetry, you can mix these AI traces with normal service traces and see the whole flow in one place. |
||||
|
||||
Visit the following resources to learn more: |
||||
|
||||
- [@official@OpenTelemetry Documentation](https://www.traceloop.com/blog/openllmetry) |
||||
- [@official@What is OpenLLMetry? - traceloop](https://www.traceloop.com/docs/openllmetry/introduction) |
||||
- [@official@Use Traceloop with Python](https://www.traceloop.com/docs/openllmetry/getting-started-python) |
||||
- [@opensource@traceloop/openllmetry](https://github.com/traceloop/openllmetry) |
Loading…
Reference in new issue