Skip to main content

Use AI to build your house!

When a new housing society emerges, residents inevitably create chat groups to connect and share information using various chat apps like WhatsApp and Telegram. In India, Telegram seems to be the favorite as it provides generous group limits, admin tools, among other features. These virtual communities become treasure troves of invaluable insights. But whatever app you use, there is always a problem of finding the right information at right time. Sure, the apps have a "Search" button, but they are pretty much limited to keyword search and are useless when you have to search through thousands of messages.

I found myself in this situation when it was my turn to start on an interior design project for my home. Despite being part of a vibrant Telegram group, where countless residents had shared their experiences with various interior designers and companies, I struggled to unearth the pearls of wisdom buried within the chat's depths. I remembered that I could take advantage of AI, particularly LLMs to sift through this data.

Telegram has a nifty feature where you can request for a data dump of a group that you are part of. The data is generally available 24 hours after request. Since my group had more than 50,000 messages, the data was shared in multiple HTML files containing the user chats. 

Telegram Group Data Dump

I needed to extract this into a simple text document that I can feed to an LLM.

I wrote a simple C# program to do this:

string? location;
do
{
    Console.WriteLine("Enter the location where Telegram chats are stored...");
    location = Console.ReadLine();
    if (string.IsNullOrWhiteSpace(location))
    {
        Console.WriteLine("Invalid location");
    }
} while (string.IsNullOrWhiteSpace(location));

//find all html files in the location
string[] htmlFiles = Directory.GetFiles(location, "*.html");
Console.WriteLine("Found " + htmlFiles.Length + " files");
foreach (string htmlFile in htmlFiles)
{
    Console.WriteLine($"Processing {htmlFile}");
    //load html file
    string html = File.ReadAllText(htmlFile);
    //extract text from div
    List<string> allChats = ExtractTextFromDiv(html);
    //write text to file
    File.AppendAllLines("chatDump.txt", allChats);
    Console.WriteLine($"Wrote {allChats.Count} chats.");
}


static List<string> ExtractTextFromDiv(string html)
{
    List<string> allChats = [];
    // Load the HTML document
    var doc = new HtmlDocument();
    doc.LoadHtml(html);

    // Find the div with class "text"
    var textDiv = doc.DocumentNode.SelectNodes("//div[@class='text']");
    if (textDiv == null || textDiv.Count == 0)
    {
        Console.WriteLine("No text div found");
        return allChats;
    }

    foreach (var node in textDiv)
    {
        if (!string.IsNullOrWhiteSpace(node.InnerText) && node.InnerText.Contains("interior"))
            allChats.Add(node.InnerText);
    }

    return allChats;
}

Finally, I could narrow it down to ~ 1300 messages. Next, I chunked these messages further into 200 messages each to avoid the context token limit of Anthropic's Claude Sonnet LLM and used Amazon Bedrock to extract the information.

static async Task<string> InvokeClaudeAsync(string messages)
{
    string claudeModelId = "anthropic.claude-3-sonnet-20240229-v1:0";
    string prompt = $"""
        You are provided with chat messages below. Find out the name and phone numbers
        of interior companies where users had a good experience
        and the work was completed with good quality and on time.
        Messages start below:
        {messages}
        """;
    // Claude requires you to enclose the prompt as follows:
    string enclosedPrompt = "Human: " + prompt + "\n\nAssistant:";

    AmazonBedrockRuntimeClient client = new(RegionEndpoint.USEast1);

    string payload = new JsonObject()
            {
                { "prompt", enclosedPrompt },
                { "max_tokens_to_sample", 200000 },
                { "temperature", 0.5 },
                { "stop_sequences", new JsonArray("\n\nHuman:") }
            }.ToJsonString();

    string generatedText = "";
    try
    {
        InvokeModelResponse response = await client.InvokeModelAsync(new InvokeModelRequest()
        {
            ModelId = claudeModelId,
            Body = AWSSDKUtils.GenerateMemoryStreamFromString(payload),
            ContentType = "application/json",
            Accept = "application/json"
        });

        if (response.HttpStatusCode == System.Net.HttpStatusCode.OK)
        {
            return JsonNode.ParseAsync(response.Body).Result?["completion"]?.GetValue<string>() ?? "";
        }
        else
        {
            Console.WriteLine("InvokeModelAsync failed with status code " + response.HttpStatusCode);
        }
    }
    catch (AmazonBedrockRuntimeException e)
    {
        Console.WriteLine(e.Message);
    }
    return generatedText;
}

Now I got a neat summary of information for each batch:

Based on the chat messages, here are the names and phone numbers of interior companies or persons who received positive feedback for good quality work delivered on time:

1. Redacted - One user mentioned finalizing interiors with them.

2. Redacted - A user had a pleasant experience with them for interiors at Redacted.

3. Redacted - A user's friend got good interiors done by them at Redacted.

4. Redacted - Recommended by a user for their great work on a villa.

5. Redacted - Quoted around 17L for a 2.5 BHK at Redacted, and one user liked their work.

6. Redacted - Did the model flat interiors at Redacted. A user mentioned they are open to discounts for bulk orders.

Some other companies mentioned without specific feedback were Redacted, Redacted and Redacted. A few users cautioned against Redacted based on poor experiences at Redacted.

The results were nothing short of remarkable. I received a meticulously curated summary, complete with names, phone numbers, and feedback on interior companies that had garnered praise for their quality work and timely delivery. Unexpectedly, the LLM even went the extra mile, cautioning against companies with poor track records, ensuring I had a well-rounded perspective. Now I had 7 set of responses which I fed again to the LLM to generate a neat list of good and bad companies!

Was using an LLM an overkill for this task? Perhaps. Was it fun? You bet!

Comments

Popular posts from this blog

Integrating React with SonarQube using Azure DevOps Pipelines

In the world of automation, code quality is of paramount importance. SonarQube and Azure DevOps are two tools which solve this problem in a continuous and automated way. They play well for a majority of languages and frameworks. However, to make the integration work for React applications still remains a challenge. In this post we will explore how we can integrate a React application to SonarQube using Azure DevOps pipelines to continuously build and assess code quality. Creating the React Application Let's start at the beginning. We will use npx to create a Typescript based React app. Why Typescript? I find it easier to work and more maintainable owing to its strongly-typed behavior. You can very well follow this guide for jsx based applications too. We will use the fantastic Create-React-App (CRA) tool to create a React application called ' sonar-azuredevops-app '. > npx create-react-app sonar-azuredevops-app --template typescript Once the project creation is done, we

Creating a Smart Playlist

A few days earlier I was thinking that wouldn't it be nice if I had something which will automatically generate a playlist for me with no artists repeated. Also, it would be nice if I could block those artists which I really hate (like Himesh Reshammiya!). Since I couldn't find anything already available, I decided to code it myself. Here is the outcome -  This application is created entirely in .NET Framework 4/WPF and uses Windows Media Player Library as its source of information. So you have to keep your Windows Media Player Library updated for this to work. It is tested only on Windows 7/Vista. You can download it from here . UPDATE : You can download the Windows XP version of the application here . Please provide your feedback!

Centralized Configuration for .NET Core using Azure Cosmos DB and Narad

We are living in a micro services world. All these services are generally hosted in Docker container which are ephemeral. Moreover these service need to start themselves up, talk to each other, etc. All this needs configuration and there are many commercially available configuration providers like Spring Cloud Config Server, Consul etc. These are excellent tools which provide a lot more functionality than just storing configuration data. However all these have a weakness - they have a single point of failure - their storage mechanism be it a file system, database etc. There are ways to work around those but if you want a really simple place to store configuration values and at the same time make it highly available, with guaranteed global availability and millisecond reads, what can be a better tool than Azure Cosmos DB! So I set forth on this journey for ASP.NET Core projects to talk to Cosmos DB to retrieve their configuration data. For inspiration I looked at Steeltoe Con