From f69130e918eb423668dbc7b39586027aa1eba55e Mon Sep 17 00:00:00 2001 From: "Charles J. Fowler" Date: Mon, 4 Nov 2024 09:52:32 +0000 Subject: [PATCH] Improve Prompt Engineering - Pitfalls of LLMs - Content & Links (#7666) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * πŸ“ƒ docs, data (Image Prompting) Update Topic/Sub Topics - In Place Edits. - intent: Update topic from May 2023 to Oct 2024 - data: src/data/roadmaps/prompt-engineering/content/ - modify - 10X .ms --- Co-authored-by: @iPoetDev * πŸ“ƒ docs, data (Prompt Engineering Roadmap) Basic Concepts - In Place Edits. - changes: single paragraphs (74-125 words)> - concerns: if any more concise, topics looses fidelity, meaning and utility. - data: src/data/roadmaps/prompt-engineering/content/ - πŸ“‚ 100-basic-llm - modify: Topic - update content: - index.md - 100-what-are-llm.md - 101-llm-types.md - 102-how-llms-built.md --- Co-authored-by: @iPoetDev * πŸ“ƒ docs: (Prompt Eng.) Basic LLM Concepts - New Links. - intent: Update topic from May 2023 to Oct 2024 - πŸ“‚ 100 basic-llm - modify topics: - add links - 100-what-are-llms.md - 101-types-llms.md - 102-how-llms-are-bilt.md BREAKING CHANGE: ❌ --- Co-authored-by: @iPoetDev * docs: (Prompt Eng.) Prompting Introduction - New Links. - intent: Update topic from May 2023 to Oct 2024 - πŸ“‚ 101-prompting-introduction - modify topics: - add links - index.md - 100-basic-prompting.md - 101-need-for-prompting.md BREAKING CHANGE: ❌ --- Co-authored-by: @iPoetDev * πŸ“ƒ docs: (Prompt Eng.) Real World Uses - Content & Links. - intent: - Update topic and links from May 2023 to Oct 2024. - Realword use cases are dynamic and evolving. - Remodelled existing examples. - data: src/data/roadmaps/prompt-engineering/content/ - πŸ“‚ 103-real-world - modify: Content Improve, 1st paragraph. - modify: Expanded Content paragraphs - index.md - 100-structured-data.md - 101-inferring.md - 102-writing-emails.md - 103-coding-assistance.md - 104-study-buddy.md - 105-designing-chatbots.md - modify: Links New - index.md - 100-structured-data.md - 101-inferring.md - 102-writing-emails.md - 103-coding-assistance.md - 104-study-buddy.md - 105-designing-chatbots.md BREAKINGCHANGE: ❌ --- Co-authored-by: @iPoetDev * πŸ“ƒ docs: (Prompt Eng.) LLM Pitfalls - Links. - intent: Insert Links from May 2023 to Oct 2024 - data: src/data/roadmaps/prompt-engineering/content/ - πŸ“‚ 104-llm-pitfalls - modify: Links New - index.md - 100-citing-sources.md - 101-bias.md - 102-halluncinations.md - 103-math.md - 104-prompt-hacking.md - modify: Copy Refresh - index.md - 100-citing-sources.md - 101-bias.md - 102-halluncinations.md - 103-math.md - 104-prompt-hacking.md BREAKINGCHANGE: ❌ --- Co-authored-by: @iPoetDev * Apply suggestions from code review spacing and styling fixes --------- Co-authored-by: Dan --- .../104-llm-pitfalls/100-citing-sources.md | 7 +- .../content/104-llm-pitfalls/101-bias.md | 9 ++- .../104-llm-pitfalls/102-hallucinations.md | 29 ++----- .../content/104-llm-pitfalls/103-math.md | 6 +- .../104-llm-pitfalls/104-prompt-hacking.md | 16 ++-- .../content/104-llm-pitfalls/index.md | 75 +++++++++++++++---- 6 files changed, 95 insertions(+), 47 deletions(-) diff --git a/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/100-citing-sources.md b/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/100-citing-sources.md index 79c752a85..c72056e32 100644 --- a/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/100-citing-sources.md +++ b/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/100-citing-sources.md @@ -1,5 +1,8 @@ # Citing Sources -LLMs for the most part cannot accurately cite sources. This is because they do not have access to the Internet, and do not exactly remember where their information came from. They will frequently generate sources that look good, but are entirely inaccurate. +As advancements have been made in the ability of Large Language Models (LLMs) to cite sources β€” particularly through realtime API access, search-augmented generation and specialized training β€” significant limitations persist. LLMs continue to struggle with hallucinations, generating inaccurate or fictitious citation. Many LLM lack real-time API access, which hampers their ability to provide up-to-date information or are limited by their knowledge cut off dates. They sometimes cannot independently verify sources or fully grasp the contextual relevance of citations, raising concerns regarding plagiarism and intellectual property. To address these challenges, ongoing efforts focus on improving realtime retrieval (RAG) methods, enhancing training, and integrating human oversight to ensure accuracy in citations. -Strategies like search augmented LLMs (LLMs that can search the Internet and other sources) can often fix this problem though. +Learn more from the following resources: + +- [@guides@Why Don’t Large Language Models Share URL References in Their Responses](https://medium.com/@gcentulani/why-dont-large-language-models-share-url-references-in-their-responses-bf427e513861) +- [@article@Effective large language model adaptation for improved grounding](https://research.google/blog/effective-large-language-model-adaptation-for-improved-grounding/) \ No newline at end of file diff --git a/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/101-bias.md b/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/101-bias.md index 96326d891..3e5ba0875 100644 --- a/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/101-bias.md +++ b/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/101-bias.md @@ -1,4 +1,11 @@ # Bias -LLMs are often biased towards generating stereotypical responses. Even with safe guards in place, they will sometimes say sexist/racist/homophobic things. Be careful when using LLMs in consumer-facing applications, and also be careful when using them in research (they can generate biased results). +Bias in Large Language Models (LLMs) remains a significant challenge, with models often generating stereotypical or discriminatory responses despite advancements in mitigation techniques. These biases can manifest in various forms, including gender, racial, and cultural prejudices, potentially leading to underfitting or overfitting in model outputs. Recent studies have highlighted persistent biases in LLM-generated content, emphasizing the need for caution when deploying these models in consumer-facing applications or research settings. Efforts to address this issue include developing diverse training datasets, implementing regulatory frameworks, and creating new evaluation tools. However, the challenge remains substantial as LLMs continue to influence societal perceptions. Developers and users must be aware of these pitfalls to avoid reputational damage and unintended negative impacts on individuals or communities. +Learn more from the following resources: + +- [@guides@Biases in Prompts: Learn how to tackle them](https://mindfulengineer.ai/understanding-biases-in-prompts/) +- [@guides@Bias in AI: tackling the issues through regulations and standards](https://publicpolicy.ie/papers/bias-in-ai-tackling-the-issues-through-regulations-and-standards/) +- [@article@What Is AI Bias?](https://www.ibm.com/topics/ai-bias) +- [@article@What Is Algorithmic Bias?](https://www.ibm.com/think/topics/algorithmic-bias) +- [@article@AI Bias Examples](https://www.ibm.com/think/topics/shedding-light-on-ai-bias-with-real-world-examples) \ No newline at end of file diff --git a/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/102-hallucinations.md b/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/102-hallucinations.md index 4ca2714b9..22e0bf72b 100644 --- a/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/102-hallucinations.md +++ b/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/102-hallucinations.md @@ -1,27 +1,14 @@ # Hallucinations -LLMs will frequently generate falsehoods when asked a question that they do not know the answer to. Sometimes they will state that they do not know the answer, but much of the time they will confidently give a wrong answer. +Large Language Model (LLM) hallucinations in 2024 can be broadly categorized into faithfulness and factuality issues. **Faithfulness hallucinations** occur when the model's output deviates from provided sources or context, including problems with source-reference divergence, context retrieval, dialogue history misinterpretation, and erroneous summarization. **Factuality hallucinations**, on the other hand, involve the generation of incorrect or unsupported information, encompassing factual inaccuracies, entity errors, overclaims, unverifiable statements, nonsensical responses, contradictions, and fabricated data. -### Causes of Hallucinations +These hallucinations stem from various causes such as training data issues, model limitations, prompt-related problems, and overfitting. To mitigate these challenges, strategies like Retrieval-Augmented Generation (RAG), improved training data, rigorous evaluation, clear user communication, advanced prompt engineering, model fine-tuning, output filtering, and multi-model approaches are being employed. As the field progresses, understanding and addressing these hallucination types remains crucial for enhancing the reliability and trustworthiness of LLM-generated content. -There are several factors contributing to hallucinations in LMs: +Learn more from the following resources: -1. **Inherent limitations**: The training data for the LMs are massive, yet they still cannot contain the entire knowledge about the world. As a result, LMs have inherent limitations in handling certain facts or details, which leads to hallucinations in the generated text. +- [@article@What are AI hallucinations?](https://www.ibm.com/topics/ai-hallucinations) +- [@article@Hallucination (artificial intelligence) - Wikipedia](https://en.wikipedia.org/wiki/Hallucination_(artificial_intelligence)) +- [@video@Why Large Language Models Hallucinate - IBM](https://www.youtube.com/watch?v=cfqtFvWOfg0) +- [@video@Risks of Large Language Models - IBM](https://www.youtube.com/watch?v=r4kButlDLUc) +- [@guides@Key Strategies to Minimize LLM Hallucinations](https://www.turing.com/resources/minimize-llm-hallucinations-strategy) -2. **Training data biases**: If the training data contains biases or errors, it may lead to hallucinations in the output as LMs learn from the data they've been exposed to. - -3. **Token-based scoring**: The default behavior of many LMs, like GPT models, is to generate text based on token probabilities. Sometimes this can lead to high-probability tokens being selected even if it doesn't make sense with the given prompt. - -### Mitigating Hallucinations - -To reduce the occurrence of hallucinations in the generated text, consider the following strategies: - -1. **Specify instructions**: Make the prompt more explicit with clear details and constraints. This can help guide the model to generate more accurate and coherent responses. - -2. **Step-by-step approach**: Instead of asking the model to generate a complete response in one go, break down the task into smaller steps and iteratively generate the output. This can help in maintaining better control over the generated content. - -3. **Model adjustments**: Tweak various parameters, such as `temperature` or `top_p`, to adjust the randomness and control of the generated text. Lower values will make the output more conservative, which can help reduce hallucinations. - -4. **Validating and filtering**: Develop post-processing steps to validate and filter the generated text based on specific criteria or rules to minimize the prevalence of hallucinations in the output. - -Remember that even with these strategies, it's impossible to completely eliminate hallucinations. However, being aware of their existence and employing methods to mitigate them can significantly improve the quality and reliability of LM-generated content. diff --git a/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/103-math.md b/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/103-math.md index b9e939ebc..793fb08c0 100644 --- a/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/103-math.md +++ b/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/103-math.md @@ -1,3 +1,7 @@ # Math -LLMs are often bad at math. They have difficulty solving simple math problems, and they are often unable to solve more complex math problems. +LLMs struggle with math. While they may have improved in solving simple math problems; they, however, coming up short when solving more complex math problems when minor semantic variation happens. This is particularly relevant in terms of mathematical reasoning. Despite advancements, they often fail at solving simple math problems and are unable to handle more complex ones effectively. Studies show that LLMs rely heavily on pattern recognition rather than genuine logical reasoning, leading to significant performance drops when faced with minor changes in problem wording or irrelevant information. This highlights a critical limitation in their reasoning capabilities. + +Learn more from the following resources: + +- [@article@Apple Says AI’s Math Skills Fall Short](https://www.pymnts.com/artificial-intelligence-2/2024/apple-says-ais-math-skills-fall-short/) \ No newline at end of file diff --git a/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/104-prompt-hacking.md b/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/104-prompt-hacking.md index 1040411d0..1c142d22e 100644 --- a/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/104-prompt-hacking.md +++ b/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/104-prompt-hacking.md @@ -1,13 +1,11 @@ # Prompt Hacking -Prompt hacking is a term used to describe a situation where a model, specifically a language model, is tricked or manipulated into generating outputs that violate safety guidelines or are off-topic. This could include content that's harmful, offensive, or not relevant to the prompt. +Prompt hacking is a form of adversarial prompting where language models are manipulated to generate outputs that violate safety guidelines or are off-topic. Common techniques include manipulating keywords, exploiting grammar and negations, and using leading questions. To combat this, developers implement safety mechanisms such as content filters, continual analysis, and carefully designed prompt templates. As language models become more integrated into digital infrastructure, concerns about prompt injection, data leakage, and potential misuse have grown. In response, evolving defense strategies like prompt shields, enhanced input validation, and fine-tuning for adversarial detection are being developed. Continuous monitoring and improvement of these safety measures are crucial to ensure responsible model behaviour and output alignment with desired guidelines. -There are a few common techniques employed by users to attempt "prompt hacking," such as: +Learn more from the following resources: -1. **Manipulating keywords**: Users may introduce specific keywords or phrases that are linked to controversial, inappropriate, or harmful content in order to trick the model into generating unsafe outputs. -2. **Playing with grammar**: Users could purposely use poor grammar, spelling, or punctuation to confuse the model and elicit responses that might not be detected by safety mitigations. -3. **Asking leading questions**: Users can try to manipulate the model by asking highly biased or loaded questions, hoping to get a similar response from the model. - -To counteract prompt hacking, it's essential for developers and researchers to build in safety mechanisms such as content filters and carefully designed prompt templates to prevent the model from generating harmful or unwanted outputs. Constant monitoring, analysis, and improvement to the safety mitigations in place can help ensure the model's output aligns with the desired guidelines and behaves responsibly. - -Read more about prompt hacking here [Prompt Hacking](https://learnprompting.org/docs/category/-prompt-hacking). +- [@article@Prompt Hacking](https://learnprompting.org/docs/category/-prompt-hacking) +- [@article@LLM Security Guide - Understanding the Risks of Prompt Injections and Other Attacks on Large Language Models ](https://www.mlopsaudits.com/blog/llm-security-guide-understanding-the-risks-of-prompt-injections-and-other-attacks-on-large-language-models) +- [@guides@OWASP Top 10 for LLM & Generative AI Security](https://genai.owasp.org/llm-top-10/) +- [@video@Explained: The OWASP Top 10 for Large Language Model Applications](https://www.youtube.com/watch?v=cYuesqIKf9A) +- [@video@Artificial Intelligence: The new attack surface](https://www.youtube.com/watch?v=_9x-mAHGgC4) \ No newline at end of file diff --git a/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/index.md b/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/index.md index d48968c7e..de64987ad 100644 --- a/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/index.md +++ b/src/data/roadmaps/prompt-engineering/content/104-llm-pitfalls/index.md @@ -1,27 +1,76 @@ # Pitfalls of LLMs -LLMs are extremely powerful, but they are by no means perfect. There are many pitfalls that you should be aware of when using them. +LLMs are extremely powerful. There are many pitfalls, safety challenges and risks that you should be aware of when using them. -### Model Guessing Your Intentions +### Language Translation -Sometimes, LLMs might not fully comprehend the intent of your prompt and may generate generic or safe responses. To mitigate this, make your prompts more explicit or ask the model to think step-by-step before providing a final answer. +There are several risks associated with LLMs in language translation. -### Sensitivity to Prompt Phrasing +- Inaccurate translations +- Contextual misinterpretation +- Biased translations +- Deepfakes +- Privacy and data security +- Legal and regulatory compliance -LLMs can be sensitive to the phrasing of your prompts, which might result in completely different or inconsistent responses. Ensure that your prompts are well-phrased and clear to minimize confusion. +### Text Generation -### Model Generating Plausible but Incorrect Answers +Text generation is a powerful capability of LLMs but also introduces certain risks and challenges. -In some cases, LLMs might generate answers that sound plausible but are actually incorrect. One way to deal with this is by adding a step for the model to verify the accuracy of its response or by prompting the model to provide evidence or a source for the given information. +- Misinformation and fake news +- Bias amplification +- Offensive or inappropriate content +- Plagiarism and copyright infringement +- Lack of transparency +- Privacy breaches -### Verbose or Overly Technical Responses +### Question Answering -LLMs, especially larger ones, may generate responses that are unnecessarily verbose or overly technical. To avoid this, explicitly guide the model by making your prompt more specific, asking for a simpler response, or requesting a particular format. +LLMs present several risks in the domain of question answering. -### LLMs Not Asking for Clarification +- Hallucination +- Outdated information +- Bias +- Harmful answers +- Lack of contextual understanding +- Privacy and security concerns +- Lack of transparency and xxplainability -When faced with an ambiguous prompt, LLMs might try to answer it without asking for clarification. To encourage the model to seek clarification, you can prepend your prompt with "If the question is unclear, please ask for clarification." +### Text summarization -### Model Failure to Perform Multi-part Tasks +Text summarization is a powerful application of LLMs but also introduces certain risks and challenge -Sometimes, LLMs might not complete all parts of a multi-part task or might only focus on one aspect of it. To avoid this, consider breaking the task into smaller, more manageable sub-tasks or ensure that each part of the task is clearly identified in the prompt. +- Information loss +- Bias amplification +- Contextual misinterpretation + +### Sentiment analysis + +Sentiment analysis, the process of determining a piece of text’s sentiment or emotional tone, is an application where LLMs are frequently employed. + +- Biased sentiment analysis +- Cultural and contextual nuances +- Limited domain understanding +- Misinterpretation of negation and ambiguity +- Overgeneralization and lack of individual variation + +### Code Assistance + +Code assistance and generation is an area where LLMs have shown promising capabilities. + +- Security vulnerabilities +- Performance and efficiency challenges +- Quality and reliability concerns +- Insufficient understanding of business or domain context +- Intellectual property concerns + +Read more from [Risks of Large Language Models: A comprehensive guide](https://www.deepchecks.com/risks-of-large-language-models/). + +Learn more from the following resources: + +- [@video@Risks of Large Language Models - IBM](https://www.youtube.com/watch?v=r4kButlDLUc) +- [@article@Risks of Large Language Models: A comprehensive guide](https://www.deepchecks.com/risks-of-large-language-models/) +- [@article@Limitations of LLMs: Bias, Hallucinations, and More](https://learnprompting.org/docs/basics/pitfalls) +- [@guides@Risks & Misuses | Prompt Engineering Guide](https://www.promptingguide.ai/risks) +- [@guides@OWASP Top 10 for LLM & Generative AI Security](https://genai.owasp.org/llm-top-10/) +- [@guides@LLM Security Guide - Understanding the Risks of Prompt Injections and Other Attacks on Large Language Models ](https://www.mlopsaudits.com/blog/llm-security-guide-understanding-the-risks-of-prompt-injections-and-other-attacks-on-large-language-models) \ No newline at end of file