Skip to main content

Use AI to build your house!

When a new housing society emerges, residents inevitably create chat groups to connect and share information using various chat apps like WhatsApp and Telegram. In India, Telegram seems to be the favorite as it provides generous group limits, admin tools, among other features. These virtual communities become treasure troves of invaluable insights. But whatever app you use, there is always a problem of finding the right information at right time. Sure, the apps have a "Search" button, but they are pretty much limited to keyword search and are useless when you have to search through thousands of messages.

I found myself in this situation when it was my turn to start on an interior design project for my home. Despite being part of a vibrant Telegram group, where countless residents had shared their experiences with various interior designers and companies, I struggled to unearth the pearls of wisdom buried within the chat's depths. I remembered that I could take advantage of AI, particularly LLMs to sift through this data.

Telegram has a nifty feature where you can request for a data dump of a group that you are part of. The data is generally available 24 hours after request. Since my group had more than 50,000 messages, the data was shared in multiple HTML files containing the user chats. 

Telegram Group Data Dump

I needed to extract this into a simple text document that I can feed to an LLM.

I wrote a simple C# program to do this:

string? location;
do
{
    Console.WriteLine("Enter the location where Telegram chats are stored...");
    location = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(location))
    {
        Console.WriteLine("Invalid location");
    }
} while (string.IsNullOrWhiteSpace(location));

//find all html files in the location
string[] htmlFiles = Directory.GetFiles(location, "*.html");
Console.WriteLine("Found " + htmlFiles.Length + " files");
foreach (string htmlFile in htmlFiles)
{
    Console.WriteLine($"Processing {htmlFile}");
    //load html file
    string html = File.ReadAllText(htmlFile);
    //extract text from div
    List<string> allChats = ExtractTextFromDiv(html);
    //write text to file
    File.AppendAllLines("chatDump.txt", allChats);
    Console.WriteLine($"Wrote {allChats.Count} chats.");
}


static List<string> ExtractTextFromDiv(string html)
{
    List<string> allChats = [];
    // Load the HTML document
    var doc = new HtmlDocument();
    doc.LoadHtml(html);

    // Find the div with class "text"
    var textDiv = doc.DocumentNode.SelectNodes("//div[@class='text']");
    if (textDiv == null || textDiv.Count == 0)
    {
        Console.WriteLine("No text div found");
        return allChats;
    }

    foreach (var node in textDiv)
    {
        if (!string.IsNullOrWhiteSpace(node.InnerText) && node.InnerText.Contains("interior"))
            allChats.Add(node.InnerText);
    }

    return allChats;
}

Finally, I could narrow it down to ~ 1300 messages. Next, I chunked these messages further into 200 messages each to avoid the context token limit of Anthropic's Claude Sonnet LLM and used Amazon Bedrock to extract the information.

static async Task<string> InvokeClaudeAsync(string messages)
{
    string claudeModelId = "anthropic.claude-3-sonnet-20240229-v1:0";
    string prompt = $"""
        You are provided with chat messages below. Find out the name and phone numbers
        of interior companies where users had a good experience
        and the work was completed with good quality and on time.
        Messages start below:
        {messages}
        """;
    // Claude requires you to enclose the prompt as follows:
    string enclosedPrompt = "Human: " + prompt + "\n\nAssistant:";

    AmazonBedrockRuntimeClient client = new(RegionEndpoint.USEast1);

    string payload = new JsonObject()
            {
                { "prompt", enclosedPrompt },
                { "max_tokens_to_sample", 200000 },
                { "temperature", 0.5 },
                { "stop_sequences", new JsonArray("\n\nHuman:") }
            }.ToJsonString();

    string generatedText = "";
    try
    {
        InvokeModelResponse response = await client.InvokeModelAsync(new InvokeModelRequest()
        {
            ModelId = claudeModelId,
            Body = AWSSDKUtils.GenerateMemoryStreamFromString(payload),
            ContentType = "application/json",
            Accept = "application/json"
        });

        if (response.HttpStatusCode == System.Net.HttpStatusCode.OK)
        {
            return JsonNode.ParseAsync(response.Body).Result?["completion"]?.GetValue<string>() ?? "";
        }
        else
        {
            Console.WriteLine("InvokeModelAsync failed with status code " + response.HttpStatusCode);
        }
    }
    catch (AmazonBedrockRuntimeException e)
    {
        Console.WriteLine(e.Message);
    }
    return generatedText;
}

Now I got a neat summary of information for each batch:

Based on the chat messages, here are the names and phone numbers of interior companies or persons who received positive feedback for good quality work delivered on time:

1. Redacted - One user mentioned finalizing interiors with them.

2. Redacted - A user had a pleasant experience with them for interiors at Redacted.

3. Redacted - A user's friend got good interiors done by them at Redacted.

4. Redacted - Recommended by a user for their great work on a villa.

5. Redacted - Quoted around 17L for a 2.5 BHK at Redacted, and one user liked their work.

6. Redacted - Did the model flat interiors at Redacted. A user mentioned they are open to discounts for bulk orders.

Some other companies mentioned without specific feedback were Redacted, Redacted and Redacted. A few users cautioned against Redacted based on poor experiences at Redacted.

The results were nothing short of remarkable. I received a meticulously curated summary, complete with names, phone numbers, and feedback on interior companies that had garnered praise for their quality work and timely delivery. Unexpectedly, the LLM even went the extra mile, cautioning against companies with poor track records, ensuring I had a well-rounded perspective. Now I had 7 set of responses which I fed again to the LLM to generate a neat list of good and bad companies!

Was using an LLM an overkill for this task? Perhaps. Was it fun? You bet!

Comments

Popular posts from this blog

Integrating React with SonarQube using Azure DevOps Pipelines

In the world of automation, code quality is of paramount importance. SonarQube and Azure DevOps are two tools which solve this problem in a continuous and automated way. They play well for a majority of languages and frameworks. However, to make the integration work for React applications still remains a challenge. In this post we will explore how we can integrate a React application to SonarQube using Azure DevOps pipelines to continuously build and assess code quality. Creating the React Application Let's start at the beginning. We will use npx to create a Typescript based React app. Why Typescript? I find it easier to work and more maintainable owing to its strongly-typed behavior. You can very well follow this guide for jsx based applications too. We will use the fantastic Create-React-App (CRA) tool to create a React application called ' sonar-azuredevops-app '. > npx create-react-app sonar-azuredevops-app --template typescript Once the project creation is done, we ...

Add Git Commit Hash and Build Number to a Static React Website using Azure DevOps

While working on a React based static website recently, there was a need to see exactly what was deployed in the Dev/Test environments to reduce confusion amongst teams. I wanted to show something like this: A quick look at the site's footer should show the Git Commit Hash and Build Number which was deployed and click through to actual commits and build results. Let's see how we achieved this using Azure DevOps. Git Commit Hash Azure DevOps exposes a variable called  $(Build.SourceVersion) which contains the hash of the commit. So I defined a variable in the Build Pipeline using it. Build Id and Build Number Azure DevOps also exposes two release time variables  $(Build.BuildId) and  $(Build.BuildNumber) which can be used to define custom variables in the pipeline. So we have a total of 3 variables defined: Next we use these variables in our React App. I created 3 global variables in index.html and assigned a token value to them. < script   type = "text/JavaScri...