Improve AI engineer content (#7924)

pull/7953/head
Vedansh 1 month ago committed by GitHub
parent 4c6f0a1234
commit a2063c2822
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
  1. 2
      src/data/roadmaps/ai-engineer/content/ai-safety-and-ethics@8ndKHDJgL_gYwaXC7XMer.md
  2. 2
      src/data/roadmaps/ai-engineer/content/anomaly-detection@AglWJ7gb9rTT2rMkstxtk.md
  3. 2
      src/data/roadmaps/ai-engineer/content/chat-completions-api@_bPTciEA1GT1JwfXim19z.md
  4. 4
      src/data/roadmaps/ai-engineer/content/development-tools@NYge7PNtfI-y6QWefXJ4d.md
  5. 4
      src/data/roadmaps/ai-engineer/content/embeddings@XyEp6jnBSpCxMGwALnYfT.md
  6. 3
      src/data/roadmaps/ai-engineer/content/googles-gemini@oe8E6ZIQWuYvHVbYJHUc1.md
  7. 2
      src/data/roadmaps/ai-engineer/content/hugging-face-hub@YLOdOvLXa5Fa7_mmuvKEi.md
  8. 2
      src/data/roadmaps/ai-engineer/content/indexing-embeddings@5TQnO9B4_LTHwqjI7iHB1.md
  9. 1
      src/data/roadmaps/ai-engineer/content/introduction@_hYN0gEi9BL24nptEtXWU.md
  10. 3
      src/data/roadmaps/ai-engineer/content/lancedb@rjaCNT3Li45kwu2gXckke.md
  11. 2
      src/data/roadmaps/ai-engineer/content/langchain@ebXXEhNRROjbbof-Gym4p.md
  12. 2
      src/data/roadmaps/ai-engineer/content/limitations-and-considerations@MXqbQGhNM3xpXlMC2ib_6.md
  13. 2
      src/data/roadmaps/ai-engineer/content/llama-index@d0ontCII8KI8wfP-8Y45R.md
  14. 2
      src/data/roadmaps/ai-engineer/content/llamaindex-for-multimodal-apps@akQTCKuPRRelj2GORqvsh.md
  15. 2
      src/data/roadmaps/ai-engineer/content/llms@wf2BSyUekr1S1q6l8kyq6.md
  16. 2
      src/data/roadmaps/ai-engineer/content/manual-implementation@6xaRB34_g0HGt-y1dGYXR.md
  17. 4
      src/data/roadmaps/ai-engineer/content/open-ai-embeddings-api@l6priWeJhbdUD5tJ7uHyG.md
  18. 4
      src/data/roadmaps/ai-engineer/content/open-vs-closed-source-models@RBwGsq9DngUsl8PrrCbqx.md
  19. 4
      src/data/roadmaps/ai-engineer/content/openai-api@zdeuA4GbdBl2DwKgiOA4G.md
  20. 2
      src/data/roadmaps/ai-engineer/content/pinecone@_Cf7S1DCvX7p1_3-tP3C3.md
  21. 2
      src/data/roadmaps/ai-engineer/content/pre-trained-models@d7fzv_ft12EopsQdmEsel.md
  22. 2
      src/data/roadmaps/ai-engineer/content/prompt-engineering@Dc15ayFlzqMF24RqIF_-X.md
  23. 2
      src/data/roadmaps/ai-engineer/content/qdrant@DwOAL5mOBgBiw-EQpAzQl.md
  24. 5
      src/data/roadmaps/ai-engineer/content/rag--implementation@lVhWhZGR558O-ljHobxIi.md
  25. 2
      src/data/roadmaps/ai-engineer/content/roles-and-responsiblities@K9EiuFgPBFgeRxY4wxAmb.md
  26. 4
      src/data/roadmaps/ai-engineer/content/semantic-search@eMfcyBxnMY_l_5-8eg6sD.md
  27. 4
      src/data/roadmaps/ai-engineer/content/speech-to-text@jQX10XKd_QM5wdQweEkVJ.md
  28. 2
      src/data/roadmaps/ai-engineer/content/supabase@9kT7EEQsbeD2WDdN9ADx7.md
  29. 2
      src/data/roadmaps/ai-engineer/content/using-sdks-directly@WZVW8FQu6LyspSKm1C_sl.md
  30. 5
      src/data/roadmaps/ai-engineer/content/vector-databases@tt9u3oFlsjEMfPyojuqpc.md
  31. 5
      src/data/roadmaps/ai-engineer/content/video-understanding@TxaZCtTCTUfwCxAJ2pmND.md
  32. 2
      src/data/roadmaps/ai-engineer/content/weaviate@VgUnrZGKVjAAO4n_llq5-.md
  33. 5
      src/data/roadmaps/ai-engineer/content/what-are-embeddings@--ig0Ume_BnXb9K2U7HJN.md
  34. 2
      src/data/roadmaps/ai-engineer/content/writing-prompts@9-5DYeOnKJq9XvEMWP45A.md

@ -5,4 +5,4 @@ AI safety and ethics involve establishing guidelines and best practices to ensur
Learn more from the following resources:
- [@video@What is AI Ethics?](https://www.youtube.com/watch?v=aGwYtUzMQUk)
- [@article@Understanding artificial intelligence ethics and safety](https://www.turing.ac.uk/news/publications/understanding-artificial-intelligence-ethics-and-safety)
- [@article@Understanding Artificial Intelligence Ethics and Safety](https://www.turing.ac.uk/news/publications/understanding-artificial-intelligence-ethics-and-safety)

@ -4,4 +4,4 @@ Anomaly detection with embeddings works by transforming data, such as text, imag
Learn more from the following resources:
- [@article@Anomoly in Embeddings](https://ai.google.dev/gemini-api/tutorials/anomaly_detection)
- [@article@Anomaly in Embeddings](https://ai.google.dev/gemini-api/tutorials/anomaly_detection)

@ -5,4 +5,4 @@ The OpenAI Chat Completions API is a powerful interface that allows developers t
Learn more from the following resources:
- [@official@Create Chat Completions](https://platform.openai.com/docs/api-reference/chat/create)
- [@article@](https://medium.com/the-ai-archives/getting-started-with-openais-chat-completions-api-in-2024-462aae00bf0a)
- [@article@Getting Start with Chat Completions API](https://medium.com/the-ai-archives/getting-started-with-openais-chat-completions-api-in-2024-462aae00bf0a)

@ -2,7 +2,9 @@
AI has given rise to a collection of AI powered development tools of various different varieties. We have IDEs like Cursor that has AI baked into it, live context capturing tools such as Pieces and a variety of brower based tools like V0, Claude and more.
Learn more from the following resources:
- [@official@v0 Website](https://v0.dev)
- [@official@Aider - AI Pair Programming in Terminal](https://github.com/Aider-AI/aider)
- [@official@Replit AI](https://replit.com/ai)
- [@official@Pieces Website](https://pieces.app)
- [@official@Pieces](https://pieces.app)

@ -4,6 +4,6 @@ Embeddings are dense, continuous vector representations of data, such as words,
Learn more from the following resources:
- [@article@What are embeddings in machine learning?](https://www.cloudflare.com/en-gb/learning/ai/what-are-embeddings/)
- [@article@What is embedding?](https://www.ibm.com/topics/embedding)
- [@article@What are Embeddings in Machine Learning?](https://www.cloudflare.com/en-gb/learning/ai/what-are-embeddings/)
- [@article@What is Embedding?](https://www.ibm.com/topics/embedding)
- [@video@What are Word Embeddings](https://www.youtube.com/watch?v=wgfSDrqYMJ4)

@ -4,5 +4,6 @@ Google Gemini is an advanced AI model by Google DeepMind, designed to integrate
Learn more from the following resources:
- [@official@Google Gemini](https://workspace.google.com/solutions/ai/)
- [@official@Google Gemini](https://gemini.google.com/)
- [@official@Google's Gemini Documentation](https://workspace.google.com/solutions/ai/)
- [@video@Welcome to the Gemini era](https://www.youtube.com/watch?v=_fuimO6ErKI)

@ -4,5 +4,5 @@ The Hugging Face Hub is a comprehensive platform that hosts over 900,000 machine
Learn more from the following resources:
- [@official@Documentation](https://huggingface.co/docs/hub/en/index)
- [@official@Hugging Face Documentation](https://huggingface.co/docs/hub/en/index)
- [@course@nlp-official](https://huggingface.co/learn/nlp-course/en/chapter4/1)

@ -5,4 +5,4 @@ Embeddings are stored in a vector database by first converting data, such as tex
Learn more from the following resources:
- [@article@Indexing & Embeddings](https://docs.llamaindex.ai/en/stable/understanding/indexing/indexing/)
- [@video@Vector Databases simply explained! (Embeddings & Indexes)](https://www.youtube.com/watch?v=dN0lsF2cvm4)
- [@video@Vector Databases Simply Explained! (Embeddings & Indexes)](https://www.youtube.com/watch?v=dN0lsF2cvm4)

@ -5,3 +5,4 @@ AI Engineering is the process of designing and implementing AI systems using pre
Learn more from the following resources:
- [@video@AI vs Machine Learning](https://www.youtube.com/watch?v=4RixMPF4xis)
- [@article@AI Engineering](https://en.wikipedia.org/wiki/Artificial_intelligence_engineering)

@ -4,5 +4,6 @@ LanceDB is a vector database designed for efficient storage, retrieval, and mana
Learn more from the following resources:
- [@official@LanceDB Website](https://lancedb.com/)
- [@official@LanceDB](https://lancedb.com/)
- [@official@LanceDB Documentation](https://docs.lancedb.com/enterprise/introduction)
- [@opensource@LanceDB on GitHub](https://github.com/lancedb/lancedb)

@ -4,5 +4,5 @@ LangChain is a development framework that simplifies building applications power
Learn more from the following resources:
- [@official@LangChain Website](https://www.langchain.com/)
- [@official@LangChain](https://www.langchain.com/)
- [@video@What is LangChain?](https://www.youtube.com/watch?v=1bUy-1hGZpI)

@ -4,5 +4,5 @@ Pre-trained models, while powerful, come with several limitations and considerat
Learn more from the following resources:
- [@article@Pretrained Topic Models: Advantages and Limitation](https://www.kaggle.com/code/amalsalilan/pretrained-topic-models-advantages-and-limitation)
- [@article@Pre-trained Topic Models: Advantages and Limitation](https://www.kaggle.com/code/amalsalilan/pretrained-topic-models-advantages-and-limitation)
- [@video@Should You Use Open Source Large Language Models?](https://www.youtube.com/watch?v=y9k-U9AuDeM)

@ -4,5 +4,5 @@ LlamaIndex, formerly known as GPT Index, is a tool designed to facilitate the in
Learn more from the following resources:
- [@official@llamaindex Website](https://docs.llamaindex.ai/en/stable/)
- [@official@Llama Index](https://docs.llamaindex.ai/en/stable/)
- [@video@Introduction to LlamaIndex with Python (2024)](https://www.youtube.com/watch?v=cCyYGYyCka4)

@ -4,5 +4,5 @@ LlamaIndex enables multi-modal apps by linking language models (LLMs) to diverse
Learn more from the following resources:
- [@official@LlamaIndex Multy-modal](https://docs.llamaindex.ai/en/stable/use_cases/multimodal/)
- [@official@LlamaIndex Multi-modal](https://docs.llamaindex.ai/en/stable/use_cases/multimodal/)
- [@video@Multi-modal Retrieval Augmented Generation with LlamaIndex](https://www.youtube.com/watch?v=35RlrrgYDyU)

@ -5,5 +5,5 @@ LLMs, or Large Language Models, are advanced AI models trained on vast datasets
Learn more from the following resources:
- [@article@What is a large language model (LLM)?](https://www.cloudflare.com/en-gb/learning/ai/what-is-large-language-model/)
- [@video@How Large Langauge Models Work](https://www.youtube.com/watch?v=5sLYAQS9sWQ)
- [@video@How Large Language Models Work](https://www.youtube.com/watch?v=5sLYAQS9sWQ)
- [@video@Large Language Models (LLMs) - Everything You NEED To Know](https://www.youtube.com/watch?v=osKyvYJ3PRM)

@ -1,6 +1,6 @@
# Manual Implementation
Services like [Open AI functions](https://platform.openai.com/docs/guides/function-calling) and Tools or [Vercel's AI SDK](https://sdk.vercel.ai/docs/foundations/tools) make it really easy to make SDK agents however it is a good idea to learn how these tools work under the hood. You can also create fully custom implementation of agents using by implementing custom loop.
Services like Open AI functions and Tools or Vercel's AI SDK make it really easy to make SDK agents however it is a good idea to learn how these tools work under the hood. You can also create fully custom implementation of agents using by implementing custom loop.
Learn more from the following resources:

@ -3,5 +3,5 @@
The OpenAI Embeddings API allows developers to generate dense vector representations of text, which capture semantic meaning and relationships. These embeddings can be used for various tasks, such as semantic search, recommendation systems, and clustering, by enabling the comparison of text based on similarity in vector space. The API supports easy integration and scalability, making it possible to handle large datasets and perform tasks like finding similar documents, organizing content, or building recommendation engines.
Learn more from the following resources:
- [@offical@OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings/create)
- [@video@Master OpenAI EMBEDDING API](https://www.youtube.com/watch?v=9oCS-VQupoc)
- [@official@OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings/create)
- [@video@Master OpenAI Embedding API](https://www.youtube.com/watch?v=9oCS-VQupoc)

@ -4,5 +4,5 @@ Open-source models are freely available for customization and collaboration, pro
Learn more from the following resources:
- [@article@OpenAI vs. open-source LLM](https://ubiops.com/openai-vs-open-source-llm/)
- [@video@AI360 | Open-Source vs Closed-Source LLMs](https://www.youtube.com/watch?v=710PDpuLwOc)
- [@article@OpenAI vs. Open Source LLM](https://ubiops.com/openai-vs-open-source-llm/)
- [@video@Open-Source vs Closed-Source LLMs](https://www.youtube.com/watch?v=710PDpuLwOc)

@ -1,3 +1,7 @@
# OpenAI API
The OpenAI API provides access to powerful AI models like GPT, Codex, DALL-E, and Whisper, enabling developers to integrate capabilities such as text generation, code assistance, image creation, and speech recognition into their applications via a simple, scalable interface.
Learn more from the following resources:
- [@official@Open AI API](https://openai.com/api/)

@ -4,6 +4,6 @@ Pinecone is a managed vector database designed for efficient similarity search a
Learn more from the following resources:
- [@official@Pinecone Website](https://www.pinecone.io)
- [@official@Pinecone](https://www.pinecone.io)
- [@article@Everything you need to know about Pinecone](https://www.packtpub.com/article-hub/everything-you-need-to-know-about-pinecone-a-vector-database?srsltid=AfmBOorXsy9WImpULoLjd-42ERvTzj3pQb7C2EFgamWlRobyGJVZKKdz)
- [@video@Introducing Pinecone Serverless](https://www.youtube.com/watch?v=iCuR6ihHQgc)

@ -4,4 +4,4 @@ Pre-trained models are Machine Learning (ML) models that have been previously tr
Visit the following resources to learn more:
- [@article@Pre-trained models: Past, present and future](https://www.sciencedirect.com/science/article/pii/S2666651021000231)
- [@article@Pre-trained Models: Past, Present and Future](https://www.sciencedirect.com/science/article/pii/S2666651021000231)

@ -4,5 +4,5 @@ Prompt engineering is the process of crafting effective inputs (prompts) to guid
Learn more from the following resources:
- [@roadmap@Prompt Engineering Roadmap](https://roadmap.sh/prompt-engineering)
- [@roadmap@Visit DedicatedPrompt Engineering Roadmap](https://roadmap.sh/prompt-engineering)
- [@video@What is Prompt Engineering?](https://www.youtube.com/watch?v=nf1e-55KKbg)

@ -4,6 +4,6 @@ Qdrant is an open-source vector database designed for efficient similarity searc
Learn more from the following resources:
- [@official@Qdrant Website](https://qdrant.tech/)
- [@official@Qdrant](https://qdrant.tech/)
- [@opensource@Qdrant on GitHub](https://github.com/qdrant/qdrant)
- [@video@Getting started with Qdrant](https://www.youtube.com/watch?v=LRcZ9pbGnno)

@ -1,3 +1,8 @@
# RAG & Implementation
Retrieval-Augmented Generation (RAG) combines information retrieval with language generation to produce more accurate, context-aware responses. It uses two components: a retriever, which searches a database to find relevant information, and a generator, which crafts a response based on the retrieved data. Implementing RAG involves using a retrieval model (e.g., embeddings and vector search) alongside a generative language model (like GPT). The process starts by converting a query into embeddings, retrieving relevant documents from a vector database, and feeding them to the language model, which then generates a coherent, informed response. This approach grounds outputs in real-world data, resulting in more reliable and detailed answers.
Learn more from the following resources:
- [@article@What is RAG?](https://aws.amazon.com/what-is/retrieval-augmented-generation/)
- [@video@What is Retrieval-Augmented Generation? IBM](https://www.youtube.com/watch?v=T-D1OfcDW1M)

@ -1,4 +1,4 @@
# Roles and Responsiblities
# Roles and Responsibilities
AI Engineers are responsible for designing, developing, and deploying AI systems that solve real-world problems. Their roles include building machine learning models, implementing data processing pipelines, and integrating AI solutions into existing software or platforms. They work on tasks like data collection, cleaning, and labeling, as well as model training, testing, and optimization to ensure high performance and accuracy. AI Engineers also focus on scaling models for production use, monitoring their performance, and troubleshooting issues. Additionally, they collaborate with data scientists, software developers, and other stakeholders to align AI projects with business goals, ensuring that solutions are reliable, efficient, and ethically sound.

@ -4,5 +4,5 @@ Embeddings are used for semantic search by converting text, such as queries and
Learn more from the following resources:
- [@article@What is semantic search?](https://www.elastic.co/what-is/semantic-search)
- [@video@What is Semantic Search? Cohere](https://www.youtube.com/watch?v=fFt4kR4ntAA)
- [@article@What is Semantic Search?](https://www.elastic.co/what-is/semantic-search)
- [@video@What is Semantic Search? - Cohere](https://www.youtube.com/watch?v=fFt4kR4ntAA)

@ -4,6 +4,6 @@ In the context of multimodal AI, speech-to-text technology converts spoken langu
Learn more from the following resources:
- [@article@What is speech to text? Amazon](https://aws.amazon.com/what-is/speech-to-text/)
- [@article@Turn speech into text using Google AI](https://cloud.google.com/speech-to-text)
- [@article@What is Speech to Text?](https://aws.amazon.com/what-is/speech-to-text/)
- [@article@Turn Speech into Text using Google AI](https://cloud.google.com/speech-to-text)
- [@article@How is Speech to Text Used?](https://h2o.ai/wiki/speech-to-text/)

@ -4,5 +4,5 @@ Supabase Vector is an extension of the Supabase platform, specifically designed
Learn more from the following resources:
- [@official@Supabase Vector website](https://supabase.com/vector)
- [@official@Supabase Vector](https://supabase.com/vector)
- [@video@Supabase Vector: The Postgres Vector database](https://www.youtube.com/watch?v=MDxEXKkxf2Q)

@ -1,6 +1,6 @@
# Using SDKs Directly
While tools like Langchain and LlamaIndex make it easy to implement RAG, you don't have to necessarily learn and use them. If you know about the different steps of implementing RAG you can simply do it all yourself e.g. do the chunking using @langchain/textsplitters package, create embeddings using any LLM e.g. use OpenAI Embedding API through their SDK, save the embeddings to any vector database e.g. if you are using Supabase Vector DB, you can use their SDK and similarly you can use the relevant SDKs for the rest of the steps as well.
While tools like Langchain and LlamaIndex make it easy to implement RAG, you don't have to necessarily learn and use them. If you know about the different steps of implementing RAG you can simply do it all yourself e.g. do the chunking using `@langchain/textsplitters` package, create embeddings using any LLM e.g. use OpenAI Embedding API through their SDK, save the embeddings to any vector database e.g. if you are using Supabase Vector DB, you can use their SDK and similarly you can use the relevant SDKs for the rest of the steps as well.
Learn more from the following resources:

@ -1,3 +1,8 @@
# Vector Databases
Vector databases are systems specialized in storing, indexing, and retrieving high-dimensional vectors, often used as embeddings for data like text, images, or audio. Unlike traditional databases, they excel at managing unstructured data by enabling fast similarity searches, where vectors are compared to find the closest matches. This makes them essential for tasks like semantic search, recommendation systems, and content discovery. Using techniques like approximate nearest neighbor (ANN) search, vector databases handle large datasets efficiently, ensuring quick and accurate retrieval even at scale.
Learn more from the following resources:
- [@article@Vector Databases](https://developers.cloudflare.com/vectorize/reference/what-is-a-vector-database/)
- [@article@What are Vector Databases?](https://www.mongodb.com/resources/basics/databases/vector-databases)

@ -1,3 +1,8 @@
# Video Understanding
Video understanding with multimodal AI involves analyzing and interpreting both visual and audio content to provide a more comprehensive understanding of videos. Common use cases include video summarization, where AI extracts key scenes and generates summaries; content moderation, where the system detects inappropriate visuals or audio; and video indexing for easier search and retrieval of specific moments within a video. Other applications include enhancing video-based recommendations, security surveillance, and interactive entertainment, where video and audio are processed together for real-time user interaction.
Learn more from the following resources:
- [@article@Video Understanding](https://dl.acm.org/doi/10.1145/3503161.3551600)
- [@opensource@Awesome LLM for Video Understanding](https://github.com/yunlong10/Awesome-LLMs-for-Video-Understanding)

@ -4,5 +4,5 @@ Weaviate is an open-source vector database that allows users to store, search, a
Learn more from the following resources:
- [@official@Weaviate Website](https://weaviate.io/)
- [@official@Weaviate](https://weaviate.io/)
- [@video@Advanced AI Agents with RAG](https://www.youtube.com/watch?v=UoowC-hsaf0&list=PLTL2JUbrY6tVmVxY12e6vRDmY-maAXzR1)

@ -1,3 +1,8 @@
# What are Embeddings
Embeddings are dense, numerical vector representations of data, such as words, sentences, images, or audio, that capture their semantic meaning and relationships. By converting data into fixed-length vectors, embeddings allow machine learning models to process and understand the data more effectively. For example, word embeddings represent similar words with similar vectors, enabling tasks like semantic search, recommendation systems, and clustering. Embeddings make it easier to compare, search, and analyze complex, unstructured data by mapping similar items close together in a high-dimensional space.
Visit the following resources to learn more:
- [@official@Introducing Text and Code Embeddings](https://openai.com/index/introducing-text-and-code-embeddings/)
- [@article@What are Embeddings](https://www.cloudflare.com/learning/ai/what-are-embeddings/)

@ -4,6 +4,6 @@ Prompts for the OpenAI API are carefully crafted inputs designed to guide the la
Learn more from the following resources:
- [@roadmap@](https://roadmap.sh/prompt-engineering)
- [@roadmap@Visit Dedicated Prompt Engineering Roadmap](https://roadmap.sh/prompt-engineering)
- [@article@How to write AI prompts](https://www.descript.com/blog/article/how-to-write-ai-prompts)
- [@article@Prompt Engineering Guide](https://www.promptingguide.ai/)
Loading…
Cancel
Save