Building a Hybrid Headless CMS: Integrating Notion with Astro

January 12, 2026

#Astro #Notion #Engineering #TypeScript #CMS

The quest for the perfect personal website stack is a rite of passage for every developer. We want it all:

Extreme Performance: Perfect Lighthouse scores.
Great Developer Experience: Type-safe content schemas.
Low Friction Writing: A CMS that doesn't feel like a chore.

Astro is unbeatable for performance and DX. But writing Markdown files in VS Code isn't always the most inspiring experience for drafting thoughts. Notion, on the other hand, offers a best-in-class writing interface but makes for a terrible frontend (slow, clunky URLs).

The Solution? An Astro Hybrid Loader that pulls content from both the local filesystem (for technical docs) and the Notion API (for flexible blogging), normalizing them into a single, high-performance content engine.

Here is the engineering story of how I built this system, the dragons I slayed (spoiler: Schema Stripping), and the npm packages that saved the day.

The Architecture

Building this required a custom pipeline. I didn't just want to "embed" Notion. I wanted full control over the rendering—meaning custom syntax highlighting, anchor links, and optimized images.

The core innovation here is normalization. By normalizing data at the Content Layer level, my frontend components (like [...slug].astro) don't even know if a post came from Notion or a local file. They just receive a standard data object.

This decoupling allows us to:

swap Notion for another CMS later without touching frontend code.
mix and match content sources seamlessly.
apply global styles (Typography, Shiki themes) to all content uniformly.

The Stack: Powering the Pipeline

This integration relies on a triad of powerful libraries:

notion-to-md: The Bridge. It converts Notion's block-based API response into standard Markdown strings. This gives us raw text we can process.
shiki: The Painter. We use Shiki for build-time syntax highlighting. Unlike client-side PrismJS, this ships zero JavaScript to the browser.
github-slugger: The Navigator. Essential for generating collision-free anchor IDs for our headers, making the Table of Contents actually clickable.

The Implementation: Custom Hybrid Loader

The heart of the system is the hybridLoader in src/content/config.ts. It iterates through both sources. We moved away from the traditional getStaticPaths approach (which fetches at page generation time) to the new Astro Content Layer API (which fetches at data layer time). This allows for better caching, incremental builds, and type-safe schema validation before a single React component is touched.

Here is a simplified look at the logic:

const hybridLoader = (globPattern: string, notionDatabaseId: string | undefined) => {
    return {
        name: 'hybrid-loader',
        load: async (context: any) => {
            // 1. Load Local Files
            const localLoader = glob({ pattern: globPattern, base: "./src/content/posts" });
            await localLoader.load(context);

            // 2. Load Notion Posts
            if (notionDatabaseId) {
                const n2m = new NotionToMarkdown({ notionClient: notion });
                const response = await notion.databases.query({
                     database_id: notionDatabaseId,
                     filter: { property: "Status", status: { equals: "Published" } }
                });

                for (const page of response.results) {
                    const mdblocks = await n2m.pageToMarkdown(page.id);
                    const mdString = n2m.toMarkdownString(mdblocks);

                    // Transformation magic happens here...
                    context.store.set({
                        id: slug,
                        data: { ...props },
                        body: mdString.parent
                    });
                }
            }
        }
    }
}

The "Gotchas": Engineering War Stories

It wasn't all smooth sailing. Integrating two different content philosophies (Structured Blocks vs Raw Markdown) led to some interesting bugs. Here are two critical issues I encountered and how I solved them.

Challenge #1: The Case of the Missing Table of Contents

I implemented a fancy extraction logic using marked to pull all headings (h1, h2...) from the markdown and calculate their slugs. I verified the data was perfect in the loader.

But when the page rendered? Empty array.

/* The data I expected */
{ "headings": [{ "depth": 1, "text": "Intro", "slug": "intro" }] }

/* The data Astro gave me */
{ "headings": undefined }

The Root Cause: Astro's Content Layer is strict. It uses Zod schemas to validate data. If you add a property (like customHeadings) to your data object but forget to define it in the schema, Astro silently strips it out to ensure type safety.

The Fix: Explicitly defining the shape of our extracted metadata in config.ts.

schema: z.object({
    // ... standard fields
    // The missing piece!
    customHeadings: z.array(z.object({
        depth: z.number(),
        slug: z.string(),
        text: z.string()
    })).optional(),
}),

Once defined, the data flowed through perfectly.

Challenge #2: Interactive "Scroll Spy"

A static Table of Contents is boring. I wanted it to update as the user reads. Since we are using Astro, we want to avoid heavy React hooks if possible. I implemented a lightweight Vanilla JS solution using IntersectionObserver.

The logic is simple but effective:

Observe all headings on the page.
Trigger when a heading crosses the top 30% of the viewport.
Toggle an .active class on the corresponding sidebar link.

/* src/pages/knowledge/[...slug].astro */
const observer = new IntersectionObserver((entries) => {
    entries.forEach(entry => {
        if (entry.isIntersecting) {
            // Highlight the active link
            const id = entry.target.getAttribute('id');
            document.querySelector(`.toc-link[href="#${id}"]`).classList.add('active');
        }
    });
}, { rootMargin: "-100px 0px -66%" }); // Trigger when heading is near top

This ensures the sidebar always reflects the user's current reading position without main-thread jank.