Building an RSS Feed Aggregator with Go and ChatGPT [Part 1/3]

Implement an automatically generated weekly digest right into your mailbox with Golang and ChatGPT!

Building an RSS Feed Aggregator with Go and ChatGPT [Part 1/3]

I've recently quit my job. I am looking for new jobs but I have found no luck landing a backend developer job with my Node.js knowledge. So, I decided to learn other languages. I've started the journey with Rust, it was enlightening but there are hardly any jobs for a junior Rust developer.

Enter Go, which has far more adaptation in the industry as a backend language compared to Node.js and Rust. Go is good for mainly two reasons. It is relatively fast ( faster than interpreted languages, a little slower than Rust, C, etc. ). It has really good developer ergonomics: fast compile times, easy syntax, and no async / await.

After learning the basics through A Tour of Go, I've decided to extend my understanding by building a real world application. Not the RealWorld real-word, but the actual real world.

The Idea

The application will aggregate new blog posts from various RSS feeds and notify the user with a newsletter of interesting posts. To make things interesting, the application will filter the posts according to:

  • How much the post is related to the tags/topics the user is interested in,
  • The popularity of the post.

The application then will generate summaries of each post using ChatGPT. And the newsletter will have a relatively simple design, such as:

Hello {Subscriber.Name},

Here are some blog posts you might find interesting:

#ForEach Post in Posts
    {Post.Name} - by {Post.Author || Post.Website}
        {Post.SummaryByChatGPT}
#ForEachEnd


Click here -Unsubscribe Link- if you don't want to receive newsletters anymore

My initial thought about the application structure on a high level is as follows:

The first few plans emerge from the diagram:

  • Find some RSS sources,
  • Find a suitable package for getting RSS sources,
  • Find a DB driver package ( probably will use MongoDB ),
  • Find how to trigger cron jobs in Go ( I was using BullMQ in Node.js projects ),
  • Find an email composer and email sender package,

Additionally, there seem to be a few endpoints I can add to the application:

  • Classic sign-up / sign-in endpoints,
  • Newsletter management endpoints: subscribe / unsubscribe,
  • User interests management endpoint: PUT /user/interests

To keep things simple, I will manage RSS sources via configuration files, and any change will require a system restart.

Notice, I've also used a single PUT request for user interests. This is again for simplicity. I am anticipating that users will have a few interests and I don't need to manage them via POST, DELETE endpoints. The list can be calculated on the client side.

The Plan

At first, all the implementations will be written in an unstructured way. I am sure most of the codes will look ugly for an experienced Go developer. But hey, my mission is to make the application work first, then fix the issues, refactor some parts, or do some performance optimizations.

In such scenarios, I always remind myself: Build something working first, others can be achieved later.

Setting up the HTTP Server

I know building an HTTP server in Go is relatively simple. The built-in HTTP library is pretty darn good. For now, I will put everything inside the main.go file:

package main

import (
	"log"
	"net/http"
)

func main() {
	http.HandleFunc("/user/sign-in", signin)
	http.HandleFunc("/user/sign-up", signup)
	http.HandleFunc("/user/interests", interests)
	http.HandleFunc("/newsletter/subscribe", subscribe)
	http.HandleFunc("/newsletter/unsubscribe", unsubscribe)

	// Crash & Log, in case anything happens during server startup
	log.Fatal(http.ListenAndServe(":3000", nil))
}

// User
func signup(w http.ResponseWriter, r *http.Request) {}
func signin(w http.ResponseWriter, r *http.Request) {}

// User Interests
func interests(w http.ResponseWriter, r *http.Request) {}

// Newsletter
func subscribe(w http.ResponseWriter, r *http.Request)   {}
func unsubscribe(w http.ResponseWriter, r *http.Request) {}

I believe that is all for the HTTP server. One thing I've found missing compared to Express.js style HTTP servers is that you can't define the method on the router level. Handlers are responsible for all the HTTP methods ( GET, POST, PUT, etc. ).

The conventional way to handle this is to add switch statements even in the official documentation. I didn't want to add them ( because it would take up too much space on the blog post ).

Adding the RSS Reader Library

There are various packages for reading & parsing RSS feeds, but one package has significantly higher usage: gofeed. This package has a clean API. I've tested it with the RSS feed from Noop Today.

package main

import (
	"fmt"
	"github.com/mmcdole/gofeed"
)

func main() {
    fp := gofeed.NewParser()
    feed, _ := fp.ParseURL("https://nooptoday.com/rss")

	for _, post := range feed.Items {
		fmt.Println(post.Title)
	}
    // ...
    // snip
}

The program outputs:

Lecture Notes: NLP with Deep Learning - 1
Top 5 Struggles of Backend Developers
Reduce Network Usage - With This 1 Simple Trick!
FeathersJS Vs NestJS - Compared in 3 Key Areas
Scalable Websocket Server Implemented by ChatGPT
How to Trace Websocket Events in NestJS - [Case Study]
Why Websockets are Hard To Scale?
Best Way to Create Dynamic Modules in NestJS
Using Custom Decorators in NestJS

That was easier than I expected 😅.

While at it, I will add a way to get RSS sources from the configuration. I want to keep things as simple as possible. The configuration file will be a simple text file with each line containing an RSS source.

These will be the RSS sources.

https://nooptoday.com/rss
https://dev.to/rss
https://hackernoon.com/feed
https://www.cdn.geeksforgeeks.org/feed

With the following addition to the main function, the program can be given RSS sources from a configuration file: rss_sources.

func main() {
	readFile, _ := os.Open("rss_sources")
	fileScanner := bufio.NewScanner(readFile)
	fileScanner.Split(bufio.ScanLines)
	var rssSources []string
	for fileScanner.Scan() {
		rssSources = append(rssSources, fileScanner.Text())
	}
	readFile.Close()
    
    fmt.Println(rssSources)
}

Setting up the Database Connection

It was really easy to connect to the database. I have no idea about the Context object in Go. So, I omitted to create a context with timeout and handling the error cases. Which, I did in almost all parts of the project. But, those can be resolved later.

func main(){
    clientOpts := options.Client().ApplyURI("mongodb://localhost:27017/?connect=direct")
    client, _ := mongo.Connect(context.TODO(), clientOpts)
}

I tested if things are working correctly with the following code:

func main() {
	clientOpts := options.Client().ApplyURI("mongodb://localhost:27017/?connect=direct")
    client, _ := mongo.Connect(context.TODO(), clientOpts)
    cursor, _ := client.Database("rss_aggregator").Collection("posts").Find(context.TODO(), bson.D{})
    defer cursor.Close(context.TODO())
    for cursor.Next(context.TODO()) {
        var result bson.D
        cursor.Decode(&result)
        fmt.Println(result)
    }
}

It works!

We have two missing parts in the project: a ChatGPT client and a mail client.

Writing the ChatGPT Client

There will be only a single HTTP request going to the ChatGPT. There is no need to add another package to the project for this.

At first, I don't want to create a separate ChatGPT client. This allows me to keep things simple. Also, I am trying not to organize things prematurely.

func createBlogSummary(blogTitle string, blogContent string) string {
	// Send a request to ChatGPT to create a summary
}

A function named createBlogSummary is enough for this use case. I specifically designed a prompt for this. If you are interacting with ChatGPT via API, you can add system prompts to your completion requests.

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "system",
      "content": "You are highly capable of writing articles and creating a short summary for them. Help the user for creating short summaries of articles."
    },
    {
        "role": "user",
        "content": "Provide a short summary of the following blog post: Title: ${blogTitle}, Content: ${blogContent}"
    }
  ]
}

Tested this prompt with this blog post's content ( up until this paragraph ). And it gave me a fairly good summary:

{
	"id": "chatcmpl-7dMPR2tsVa1QjDPY0QMN9e4abGKNL",
	"object": "chat.completion",
	"created": 1689615289,
	"model": "gpt-3.5-turbo-0613",
	"choices": [
		{
			"index": 0,
			"message": {
				"role": "assistant",
				"content": "This blog post discusses the author's decision to learn the programming language Go after facing difficulties finding a backend developer job with their knowledge in Node.js. They share their journey of building a real-world application using Go, specifically an RSS feed aggregator that generates summaries of blog posts using ChatGPT. The author outlines the initial plans for the application, including finding RSS sources, setting up a database connection, and integrating a ChatGPT client. They also discuss setting up an HTTP server using the built-in HTTP library in Go. The blog post concludes by mentioning the next steps, such as adding a mail client and finalizing the implementation of the ChatGPT client."
			},
			"finish_reason": "stop"
		}
	],
	"usage": {
		"prompt_tokens": 986,
		"completion_tokens": 132,
		"total_tokens": 1118
	}
}

Let's fill in the function body to send a request to ChatGPT API:


func createBlogSummary(blogTitle string, blogContent string) string {
	chatGPTRequest := ChatGPTRequest{
		Model: ChatGPTTurbo,
		Messages: []ChatGPTRequestMessage{{
			Role:    ChatGPTSystem,
			Content: "You are highly capable of writing articles and creating a short summary for them. Help the user for creating short summaries of articles.",
		}, {
			Role:    ChatGPTUser,
			Content: "Provide a short summary of the following blog post: Title: " + blogTitle + " Content: " + blogContent,
		}},
	}

	requestBody, err := json.Marshal(chatGPTRequest)

	if err != nil {
		log.Fatal(err)
	}
	
	token := os.Getenv("OPENAI_TOKEN")
	requestBytes := bytes.NewBuffer(requestBody)
	request, _ := http.NewRequest("POST", "https://api.openai.com/v1/chat/completions", requestBytes)
	request.Header.Set("content-type", "application/json")
	request.Header.Set("Authorization", "Bearer " + token)

	client := http.Client{}
	response, err := client.Do(request)

	if err != nil {
		fmt.Println(err)
		return ""
	}

	defer response.Body.Close()
	body, err := io.ReadAll(response.Body)
	if err != nil {
		fmt.Println(err)
	}

	return string(body)
}

I didn't include the struct definitions for ChatGPTRequest, but you can imagine it is a simple struct with the fields in the request body.

Also, at first, I tried to send the request in the following form:

http.Post("https://api.openai.com/v1/chat/completions", "application/json", requestBytes)

But, apparently there is no way to set headers in this format. So, I used the http.NewRequest API, which allows for modifying headers before sending the request.

Okay, the last functioning part we need is a mail client.

Mail Client

Surprise, surprise. I am running a self-hosted mail server. I can send emails from the server using SMTP. I am happy that Go has a built-in SMTP client. I used it according to the documentation and this is the function that will send newsletters to users:

func sendNewsletter(to string, content string) {
	server := os.Getenv("SMTP_SERVER")
	fromIdentity := os.Getenv("SMTP_FROM_IDENTITY")
	fromMail := os.Getenv("SMTP_FROM_MAIL")
	password := os.Getenv("SMPT_PASSWORD")

	msg := []byte("From: " + fromIdentity + " " + fromMail + "\r\n" +
		"To: " + to + "\r\n" +
		"Subject: Newsletter: Weekly Blog Summaries\r\n" +
		"\r\n" + content + "\r\n")

	err := smtp.SendMail(
		server+":587",
		smtp.PlainAuth(fromIdentity, fromMail, password, server),
		fromMail,
		[]string{to},
		msg)

	if err != nil {
		fmt.Println(err)
	}
}

When I first tried to log environment variables they didn't show up 😞. I had to add a package to the project to include environment variables from the .env file. I've used https://github.com/joho/godotenv/ package. It is very simple to use. You only need to add the following code at the beginning of the main function.

func main() {
	err := godotenv.Load()
	if err != nil {
		log.Fatal("Error loading .env file")
	}
}

Summary

Now, the project meets all the functional requirements:

  • Users can sign-in/sign-up
  • Users can manage their interests and they can subscribe/unsubscribe to the newsletter
  • The project can read RSS feeds from a predefined list of RSS sources
  • The project can create summaries of blog posts using ChatGPT API
  • The project can send emails to users

What is next?

In the following post, I need to merge all the functionalities to work together. The project is missing a user interface. Also one functional requirement is missing at the moment, which is running cron jobs in Go. For now, it didn't create a problem since the project is not ready, yet.

Help wanted!

I am doing this project to learn the Go language. Just so I have a functioning program doesn't mean it is written in a conventional way. I am sure there are "Go way of doing things" that I missed. Please let me know in the comments what could have been written better.