le nguyen

index

A project that I built for the embedded security team during my internship at Bosch. I really loved this project, but honestly, I don’t love security that much 🥹. This was also the first time I coded extensively, writing about 1,700 LoC (lines of code) in Python for the backend.

WARNING: All the code below is not the real code in the project, it’s just a concept.

1. Source code parsing

For more accuracy, I parse the code to functions, typedefs and macros then put it to the database for RAG

1
from clang import cindex
2

3
index = cindex.Index.create()
4
tu = index.parse(
5
    path, # this is path to file I need to parse
6
    # I need to use this option so the parser doesn't ignore macros (such as #define)
7
    options=TranslationUnit.PARSE_DETAILED_PROCESSING_RECORD
8
)
9

10
# parse function for C source code
11
def parse(self):
12
    for cursor in self.tu.cursor.walk_preorder():
13
        match cursor.kind:
14
            # if you want to parse typedef
15
            case CursorKind.TYPEDEF_DECL:
16
                ...
17
            # if you want to parse macros
18
            case CursorKind.MACRO_DEFINITION:
19
                ...
20
            # if you want to parse functions
21
            case CursorKind.FUNCTION_DECL:
22
                ...
23

24
# we can even get the body of a function
25
# just get the start line and the end line
26
def get_full_code(self, cursorm, path):
27
    extent = cursor.extent
28
    start = extent.start
29
    end = extent.end
30

31
    # read source code
32
    with open(path, "r") as f:
33
        content = f.readlines()
34
        return "".join(content[start.line - 1 : end.line]).strip()

2. ‘LLM’ architecture

a. The backend

To support multiple backend, I just use simple switch-case and Inheritance:

1
class Backend:
2
    # the model is pydantic object of settings related to LLM
3
    def __init__(self, model):
4
        # some settings here
5
        ...
6

7
        # because the need to support API
8
        # I use a new feature in langchain called MemoryLimiter to limit the API use
9
        # https://api.python.langchain.com/en/latest/rate_limiters/langchain_core.rate_limiters.InMemoryRateLimiter.html
10

11
    @property
12
    def model(self):
13
        # this is the different of each backend
14
        raise NotImplementedError

For example, I can add OpenAI and Ollama backends as shown below. I need to use _model to avoid recreating the model when you call model multiple times.

1
class OpenAIBackend(Backend):
2
    @property
3
    def model(self):
4
        if not hasattr(self, "_model"):
5
            self._model = ChatOpenAI(
6
                # put all the settings you want here
7
                ...
8
            )
9
        return self._model
10

11
class OllamaBackend(Backend):
12
    @property
13
    def model(self):
14
        if not hasattr(self, "_model"):
15
            self._model = ChatOllama(
16
                # put all the settings you want here
17
                ...
18
            )
19
        return self._model

And now use simple factory design to create backend (just another switch-case):

1
def create_backend(backend_type):
2
    match backend_type:
3
        case "openai":
4
            ... # create openai backend
5
        # add more cases if you want to support more backends

b. Detail Implementation

The architecture is inspired heavily by this Langraph Tutorial. However, after a while, I couldn’t figure out how to implement memory into one of the nodes (probably due to skill issues), so I decided to build my own. It’s quite simple - just create a model for each node (though this isn’t great for performance) and then use conditions to decide which node to go to next. The crucial part is the prompt - since I use a local LLM model (throughllama.cpp), if the prompt is poor, the performance suffers significantly, even causing the LLM to fail to generate answers in the Route node.

In the Route LLM, there are two options: use structured_output, a new way to generate structured answers supported by Langchain, or prompt the model to generate structured output like JSON. I chose the latter because the first option has many bugs - if the LLM can’t generate “structured output”, Langchain won’t return an answer. With the JSON approach, it will always generate something, and if the answer isn’t in JSON format, I simply route it to the database.

1
# just copy from https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_adaptive_rag_local/#components
2
ROUTE_SYSTEM_PROMPT = """You are an expert at routing a user question to a vectorstore or web search.
3
The vectorstore contains documents related to code in project {project}.
4
Use the vectorstore for questions on these topics. For all else, and especially for current events, use web-search.
5
Return JSON with single key, datasource, that is 'websearch' or 'vectorstore' depending on the question."""
6

7
# and I add to user prompt for more strictly
8
ROUTE_USER_PROMPT = """Below is the question, read it carefully and tell me what route I should use
9
{question}"""

In the RAG LLM, to improve the accuracy, I use long context reorder. For the memory LLM, I just copy 100% from this tutorial but adjust a little bit (so 99% and I’m in love with Langgraph right now, fantastic framework). But why do I need to use RAG and another LLM with memory just to generate code? At the early stage, I wanted a model that could retrieve conditionally (but you know, I couldn’t, due to skill issues again), so I had to use RAG, retrieve one time only and then put all the context into the memory LLM, and finally use a syntax checker (in C, I just use clang -fsyntax-only) to check the code, keep generating it and just loop the process. Below is the RAG implementation:

1
class RAG:
2
    def __init__(self, backend, database):
3
        # some settings here
4
        ...
5

6
        # create retriever
7
        self.retriever = ContextualCompressionRetriever(
8
            base_compressor=DocumentCompressorPipeline(transformers=[LongContextReorder()]),
9
            base_retriever=database.as_retriever(
10
                # some settings in retriever
11
                search_type="mmr", search_kwargs={"k": 7}
12
            )
13
        )
14

15
        # and then create RAG
16
        # get_qa_prompt() is the system prompt that I use to provide context for generate code
17
        self.qa_chain = create_stuff_document_chain(backend.model, get_qa_prompt())
18
        self.rag = create_retrieval_chain(
19
            self.retriever,
20
            self.qa_chain
21
        )
22

23
    # run rag
24
    def run(self, question):
25
        return self.rag.invoke({"input": question})

Here’s the implementation of the syntax checking system:

1
class ProgramError(Enum):
2
    # syntax error
3
    SYNTAX = auto()
4
    # we can actually extend to more error
5
    # such as compile, linking, ...
6
    # LINKING = auto()
7
    COMPILE = auto()
8

9
class CheckError(Exception):
10
    def __init__(self, message, code):
11
        super().__init__(message)
12
        self.code = code
13

14
class Codechecker:
15
    ...
16

17
    def check_syntax(self, path: str):
18
        # we use clang to check syntax
19
        # so we need to write the code from LLM to file before doing checking
20
        try:
21
            subprocess.run(["command to check syntax"])
22
        except subprocess.CalledProcessError as exc:
23
            # we need error message to feedback to LLM
24
            raise CheckError(ProgramError.SYNTAX, exc.message)
25

26
    # we can extend to check_linking and compile too or more, I don't know
27
    def check_compile(self, path: str):
28
        try:
29
            subprocess.run(["command to check compile"])
30
        except subprocess.CalledProcessError as exc:
31
            # we need error message to feedback to LLM
32
            raise CheckError(ProgramError.COMPILE, exc.message)
33

34
    # now we do a sequence check
35
    def check_program(self, path: str):
36
        self.check_syntax(path)
37
        self.check_compile(path)
38

39
    # now we do the loop of error checking
40
    def run(self, rag, question):
41
        # this is a function to generate code from RAG
42
        # then write this code to a file
43
        # and return the code and path of the file
44
        code, path = self.generate_code(rag, question)
45

46
        # In RAG, I implement a function to get the context retrieved
47
        # And memoryLLM will extract this context from RAG
48
        llm = MemoryLLM(rag)
49

50
        # now extract the code
51
        while True:
52
            try:
53
                self.check_program(path)
54
                # check ok
55
                return code
56
            except CheckError as e:
57
                match e.code:
58
                    case ProgramError.SYNTAX:
59
                        new_question = "Generate prompt based on sytax error"
60
                        ...
61
                    case ProgramError.COMPILE:
62
                        new_question = "Generate prompt based on compile error"
63
                        ...
64
                code, path = self.generate_code(llm, new_question)
65
                # NOTE: you can actually stop earlier
66
                # but the above code is the main implementation
67
                # I just drop all the details

Cool, now we just combine everything to create a big loop, it’s really easy:

1
class Pipeline:
2
    # I assume you have initialized all needed LLM
3
    ...
4

5
    def run(self, question):
6
        route = self.route_llm.run(question)
7

8
        if route == "vectorstore":
9
            # is an instance of Codechecker
10
            return self.code_checker.run(self.rag, question)
11
        elif route == "websearch":
12
            # is an instance of WebLLM
13
            return self.web_llm.run(question)

3. The API

Now we move to the API part, how I implement an API to create a good backend (not really good, but I want to use this buzz word LOL). The first thing I considered is how I can manage all the objects created, because users can call the API to create objects whenever they want, so I cannot create empty objects beforehand. In Python, creating an object at runtime is really easy:

1
# this is the main class for application
2
# but in real projects, I implement it as a singleton class
3
# to make sure the whole application cannot be changed at runtime
4
class App:
5
    def register_attr(self, name, value):
6
        """
7
        name: Attribute name for access via self.name
8
        value: Value to assign to the "name" attribute
9
        """
10
        if name == "backend":
11
            # do something to create llm (or backend because I want to call it that way)
12
            # it can be:
13
            self._create_attr("backend", OpenAIBackend(value))
14
            # you can create more than that
15
            self._create_attr("rag", RAG(self.backend))
16
            # and more ...
17
        # put creation logic you want here as a new branch
18

19
    def _create_attr(self, name, value):
20
        # We can use a function in Python to check if an attribute is already created
21
        if hasattr(self, name):
22
            # delete old attribute
23
            delattr(self, name, value)
24
        setattr(self, name, value)

Now we go to the API. I use FastAPI and Pydantic, so it is really easy. From the example below, I hope you can create more APIs:

1
@app.post("/backend")
2
def create_backend(model: BackendModel)
3
    """BackendModel is a Pydantic data class"""
4
    # app is an instance of App
5
    try:
6
        app.register_attr("backend", model)
7
    except SomeErrorFromApp as e:
8
        # you can add error handling to register_attr in App class
9
        raise HTTPException(status_code=404, detail=e.message)
10

11
    return {"message": "Create backend successfully"}

Ok cool, so we just walked through a brief discussion of my implementation for the API backend. Now we go to the hardest part, the frontend. Because I use VSCode extension as a frontend, yeah, no knowledge about JavaScript is killing me.

Thank you so much for reading. Thanks my team at Bosch for helping me, I really love Bosch culture 🥹.

RAG Fuzz

1. Source code parsing

2. ‘LLM’ architecture

a. The backend

b. Detail Implementation

3. The API