30 minutes Phone-Screen and then I made it to the technical session, which was two hours of technical and programming questions. The interviewers were nice and overall it was a good experience
J'ai postulé via une agence de recrutement. Le processus a pris 2 semaines. J'ai passé un entretien chez Fidelity Investments (West Lake Hills, TX) en oct. 2025
Entretien
One round of zoom interview for the duration of an hour. Couple of Indian guys did the interview, one is a technical project manager and other is most likely a team lead from South India I guess. Prepare for these question below and you will get through the interview if these guys are interviewing.
Questions d'entretien [1]
Question 1
1. What is the difference between the chatbot and virtual assistant?
2. What is the tech stack of the conversational AI Assistant?
3.Where is the application hosted?
4. What is RAG?
5. Can you talk more about Retrieval and Generation?
6. Can you discuss the retrieval part of the RAG architecture?
7. How do you chunk a PDF document for retrieval in RAG?
8. Why is chunking important in RAG?
9. Is text splitting alone enough for good context generation, or are there other ways to create chunks?
10. Why do we choose a vector database for RAG?
11. How do you generate an answer from the retrieved chunks in RAG?
12. Which language model did you use in your project for answer generation?
13. How can you fine-tune the answers generated by the models?
14. What is embedding and what is embedding size?
15. What are the trade-offs between higher and lower embedding sizes?
16. What methodologies do you use to decide on embedding size and model before designing the system?
17. What metrics do you use to choose the embedding model, considering its impact on RAG performance?
18. What is your approach for selecting the embedding model?
19. What should you consider and how do you structure prompts when writing them?
20. Give an example from your last project of how you wrote prompts.
21. How do you make a call to a language model that is hosted remotely, from a technical perspective?
22. If I submit the same prompt to the model again, will I get the same answer?
23. How can we make sure the model gives consistent answers every time for the same question?
24. How do you evaluate your prompts and the model’s answers?
25. When should you use few-shot prompting, and what is it?
26. Can you talk about chain-of-thought prompting?
27. What is chain-of-thought prompting and how does it help?
28. Why should you use LangChain instead of calling LLM model APIs directly?
29. Can you explain the simple architecture of LangChain and how to use it?
30. How would you build a simple chat app with LangChain that reads a database schema from a text file and answers user questions?
31. Why should you use LangGraph instead of step functions for creating workflows?
32. Can you quickly walk through a high-level architecture for a multi-agent system using LangGraph?
33. In a multi-agent system, how does the system decide which agent to use for a user question?
34. Is there a way to control cyclic iterations in LangGraph?
35. For a predefined document processing workflow (like checking for signatures), should you use LangGraph or something else?
36. Why not just use LangChain for simple, predefined workflows instead of LangGraph?
37. What information do you need to provide when creating a custom tool for an LLM so the model can use it?
38. If there is a very big document and RAG (Retrieval-Augmented Generation) is not used, what can you do?
39. What is a ReAct agent and what does it do?
40. How do you make sure every FastAPI endpoint validates an authorization token before doing its job?
41. Can dependency injection be used in FastAPI to handle authorization token validation for endpoints?
42. How do you create a FastAPI endpoint with a request and response body?
43. Have you designed any chatbots with conversation history enabled?
44. Do you know anything about context management in chatbots?
45. Do you know about context poisoning and context engineering in chatbots?
46. In a RAG (Retrieval-Augmented Generation) system, how do you ensure the generated answer is relevant before returning it to the user?
47. Can you explain what you mean by "role-based conditioning" in the context of LLMs?
48. How do you run LLM solutions using Python FastAPI?
49. How do you deploy and run a Python FastAPI application, both locally and in production?
50. How do you run and test a Python FastAPI application on your local machine?
51. Do you use an IDE or the terminal to start the FastAPI server?
52. How do you start a FastAPI server on a specific port for local debugging?
53. What does the main:app part mean in the Uvicorn command when starting a FastAPI server?
54. What is the best way to organize a FastAPI codebase?
55.How do you handle exceptions in FastAPI?
56. How can you add request parameter validation in FastAPI, for example, to ensure a question does not exceed 100 characters?