Bookmark Manager | Damilola Odujoko

I have a couple of ideas/projects that require getting details from a URL and displaying them with a nice UI "component". One of such ideas is having better links in the References section of each post. A native list is currently being used, but it would be nice to have the title of each link displayed without compromising on my writing experiencing.

Requirements

Build a simple service to power the References on articles
The frontend sends links
The service makes a request to each link and returns the following:
- Title
- SEO image
- Short description
- Favicon URL

The title will be the only parameter used in the first iteration of this feature.

Thought process

I'll be trying a different approach this time around. Rather than spending time doing a lot of research, I'll come up with a quick solution first, then research on areas of improvements. Here's a breakdown of a quick solution:

Make a request to the specified endpoint
Check for 2xx status code
Parse HTML document
Return the parsed content to the client

Implementation

I need an endpoint to make a request to the specified URL and return the parsed content. For a start, I need these functions:

FetchPageDetails(): A HTTP handler that initiates a request to the specified URL
parseHTML(): An internal function that processes the result of the HTTP request
parseFaviconURL(): Builds the full URL for the favicon if only the path is provided
isFullURL(): Check if a URL contains the host/domain name

mkdir bookmark-manager && cd bookmark-manager
go mod init github.com/odujokod/bookmark-manager

With the project in place, I created the main.go ,bookmark_test.go and bookmark.go in the root directory.

Fetching the HTML

To validate my thought process, I wrote the test to fetch a page given a URL, checking to see if a 200 response is returned. Then I'm able to implement the feature to make the test pass:

Parsing the HTML

With the page now being fetched, I need to get the necessary details for the frontend. From the requirements, the necessary details can be found in the <head> tag. This makes parsing slightly easier. Over to the test:

1
func TestParseHTML(t *testing.T) {
2
  // I can actually read this from the sample.html file
3
  sampleHTML := `<!DOCTYPE html>
4
<html lang="en">
5
<head>
6
   <meta charset="UTF-8">
7
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
8
   <meta name="description" content="Description goes here">
9
   <meta name="og:title" content="Go test">
10
   <meta name="og:description" content="Description goes here">
11
   <meta name="og:image" content="https://cdn1.iconfinder.com/data/icons/google-s-logo/150/Google_Icons-09-1024.png">
12
   <title>Go test</title>
13
</head>
14
<body>
15
   <div>
16
      Hello world
17
   </div>
18
</body>
19
</html>`
20

12 collapsed lines
21
  htmlBytes := []byte(sampleHTML)
22

23
  got, err := ParseHTML(htmlBytes)
24
  if err != nil {
25
    t.Errorf("Unable to parse HTML: %v", err)
26
  }
27
  expectedTitle := "Go test"
28

29
  if got.Title != expectedTitle {
30
    t.Errorf("Expected: %s, got: %s", expectedTitle, got.Title)
31
  }
32
}

The test gives an insight into the implementation of the feature. I'll need a HTML parser that allows me walk through the HTML tree with ease. I found GoQuery, a library built on top of the net/html library, to handle the HTML parsing:

1
go get github.com/PuerkitoBio/goquery

With GoQuery installed, I can now implement the parsing logic:

1
import (
2
  // other imports
3
  "strings"
4

5
  "github.com/PuerkitoBio/goquery"
6
)
7

8
type Bookmark struct {
9
  Title string `json:"title"`
10
  Description string `json:"description"`
11
  FaviconURL string `json:"faviconURL"`
12
  ImageURL string `json:"imageURL"`
13
}
14

15
func ParseHTML(html []byte) (Bookmark, error) {
16
  doc, err := goquery.NewDocumentFromReader(bytes.NewBuffer(html))
17
  if err != nil {
18
    return Bookmark{}, err
19
  }
20
  bookmark := Bookmark{}
18 collapsed lines
21

22
  title := strings.Trim(doc.Find("title").Text(), "\n ")
23
  bookmark.Title = title
24

25
  doc.Find("meta").Each(func(i int, s *goquery.Selection) {
26
    c, _ := s.Attr("name")
27
    value, _ := s.Attr("content")
28
    switch c {
29
    case "description", "og:description":
30
      bookmark.Description = value
31
    case "og:image":
32
      bookmark.ImageURL = value
33
    default:
34
    }
35
  })
36

37
  return bookmark, nil
38
}

Handling favicons

I considered using the favicon for the frontend component, so I decided to extend the response. Favicons can be specified with a fully qualified URL or a resource path. It would be easier to have a single representation for it. To do this, I need to check if the URL is a resource path or not. For a resource path, I simply append it to the main URL:

Refactoring

With the parsing logic in place, I can now refactor the fetch test and finalise the function implementation:

Router

A router can now be created to provide access to the client. In the main() function of the main.go file, I created and configured the server multiplexer:

1
package main
2

3
import (
4
  "fmt"
5
  "log"
6
  "net/http"
7
)
8

9
const PORT string = ":8081"
10

11
func main() {
12
  mux := http.NewServeMux()
13
  mux.HandleFunc("GET /fetch", FetchPageDetails)
14

15
  fmt.Printf("Server running on port: %s\n", PORT)
16
  log.Fatal(http.ListenAndServe(PORT, mux))
17
}

Usage

This site is built with Astro and Markdoc is used to manage content. Without going out of scope of this article, using the API is a three step process:

I built a Bookmark component in Astro
I added the .astro component to the Markdoc configured

In the References section, I wrapped the native list with the Markdoc/Astro component:

1
{% bookmark type="default" %}
2
- https://developer.mozilla.org/en-US/docs/Web/HTTP/Cookies
3
- https://datatracker.ietf.org/doc/html/rfc6265
4
{% /bookmark %}

The References section below is the outcome of the first phase of this feature.

Going forward

How do I handle pages that have anti-bot?
How should I handle missing og:image?
Where should I deploy? Coolify? Or a general cloud provider?
How should storage be handled? DB or Cache or both?
I should use Goroutines to manage simultaneous requests from the client

References

Using HTTP cookies - HTTP | MDN

RFC 6265 - HTTP State Management Mechanism