Mutate HTML with PHP and Laravel

PHP

Introduction

I recently came across the need to search through raw HTML string received from user input and modify it. By using the DOMDocument and DOMXPath APIs I was able to search the HTML on a per element basis and modify attributes and inner content.

Basic Usage

Basic usage of these two APIs would be to search for all of the h2 tags present in the HTML.

public function getHeadings(string $content): array
{
    $dom = new DOMDocument();
    $dom->loadHTML($content);
    $xpath = new DOMXPath($dom);

    $headings     = [];
    $raw_headings = $xpath->evaluate("/html/body//h2");
    for ($i = 0; $i < $raw_headings->length; $i++) {
        $heading = $raw_headings->item($i);

        $headings[] = $heading->textContent;
    }

    return $headings;
}

As shown above, we are using the DOMDocument API to create a DOM document from our raw HTML string and then using DOMXPath to query that HTML, which simply returns an iterable result we can work with!

Add ID Attributes To Headings

The true power of this comes in when we want to modify attributes of that HTML, let's look at an example of adding "id" attributes to the headings and then generate an array we can use for a table of contents section for a blog post.

Here we will use the str() helper function included with the Laravel framework for simplicity purposes. If you're not using Laravel then a function to modify strings into slug format would be needed.

use DOMDocument;
use DOMXPath;

function addIdAttributes(string $content): string
{
    $dom = new DOMDocument("1.0", "UTF-8");
    $dom->loadHTML($content);
    $xpath    = new DOMXPath($dom);
    $headings = $xpath->evaluate('/html/body//h2');
    for ($i = 0; $i < $headings->length; $i++) {
        $heading = $headings->item($i);

        $text = str($heading->textContent)->slug();

        $heading->removeAttribute('id');
        $heading->setAttribute('id', $text);
    }

    return utf8_decode($dom->saveHTML());
}

This function would return the modified HTML with the ID attributes set on all the h2 tags. The value for the ID attribute would be the slugified version of the text content of the h2 tag. To do this for multiple heading tags, you could simple query the HTML again and pass it through the for loop.

Output HTML

<h2>Hello World</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit.</p>

<!--- HTML after parse -->
<h2 id="hello-world">Hello World</h2>
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit.</p>

Mutating Img Tag Attributes

Img tags can also be modified in this way, here's an example of adding lazyloading to all images in the HTML string.

use DOMDocument;
use DOMXPath;

function lazyloadImages(string $content): string
{
    $dom = new DOMDocument();
    $dom->loadHTML($content);
    $xpath  = new DOMXPath($dom);
    $images = $xpath->evaluate('/html/body//img');
    for ($i = 0; $i < $images->length; $i++) {
        $image = $images->item($i);
        $image->removeAttribute('loading');
        $image->setAttribute('loading', 'lazy');
    }

    return utf8_decode($dom->saveHTML());
}

Combine Methods Into a Blog Content Pipeline

Pipelines are an underused and under documented gem within the Laravel framework, in this case we can use them to run a string of content through variousĀ pipesĀ and end with the final formatted HTML string at the end.

Creating the Pipes

For each action of the pipeline we need to create a class with a handle method.

<?php

namespace App\Pipelines\PostFormatting;

use Closure;
use DOMDocument;
use DOMXPath;

class LazyloadImages
{
    public function handle(string $content, Closure $next)
    {
        $dom = new DOMDocument();
        $dom->loadHTML($content);
        $xpath  = new DOMXPath($dom);
        $images = $xpath->evaluate('/html/body//img');
        for ($i = 0; $i < $images->length; $i++) {
            $image = $images->item($i);
            $image->removeAttribute('loading');
            $image->setAttribute('loading', 'lazy');
        }

        return $next(utf8_decode($dom->saveHTML()));
    }
}
<?php

namespace App\Pipelines\PostFormatting;

use Closure;
use DOMDocument;
use DOMXPath;

class AddIdsToHeadings
{
    public function handle(string $content, Closure $next)
    {
        $dom = new DOMDocument();
        $dom->loadHTML($content);
        $xpath    = new DOMXPath($dom);
        $headings = $xpath->evaluate('/html/body//h2');
        for ($i = 0; $i < $headings->length; $i++) {
            $heading = $headings->item($i);
            $text    = str($heading->textContent)->slug();

            $heading->removeAttribute('id');
            $heading->setAttribute('id', $text);
        }

        return $next(utf8_decode($dom->saveHTML()));
    }
}

Finally we need to instantiate our pipeline which we can do from a service or action.

<?php

namespace App\Services;

use App\Pipelines\PostFormatting\AddIdsToHeadings;
use App\Pipelines\PostFormatting\LazyloadImages;
use Illuminate\Pipeline\Pipeline;

class PostContentFormatterService
{
    public function format(string $content): string
    {
        $pipes = [
            AddIdsToHeadings::class,
            LazyloadImages::class,
        ];

        return app(Pipeline::class)
            ->send($content)
            ->through($pipes)
            ->thenReturn();
    }
}

Chaining methods like this in a pipeline allows to organise our code efficiently and make it extremely easy to understand the flow of what's happening to our string of HTML before it's committed to the database.

We have simply scratched the surface of what you can achieve with these APIs in the article. With the ability to read and modify HTML in an action or service opens the doors to things like content analysis for SEO, validation and sanitization of your HTML.

Copyright © 2024 | bonnick.dev