ASAPAi Soon As Possible · AI & tech, delivered fastest
Article

How to Make a Music Video with AI: A 4-Step Guide Using Suno, GPT Image, and Kling

AASAP
2026-06-18 · 4 min read

With no code and no camera—just three AI tools—you can make an animated music video. The song comes from Suno, the artwork from GPT Image, the animation from Kling, and the editing from CapCut, all in four steps: (1) make the music, (2) create character and shot images, (3) animate with Kling, and (4) edit in CapCut. The key is to "lock in a single character image to keep things consistent," so that even as the shots change, the same character and the same mood carry through.

1. The Overall Flow and What You'll Need

An AI music video is built on a pipeline of "music → images → animation → editing." Each stage has its own tool that does the job well, and stitching them together produces a finished piece.

Here is what you'll need.

  1. Suno — generates the song from just lyrics and a style (music)
  2. GPT Image (ChatGPT) — character and scene keyframes (still images)
  3. Kling — turns still images into video (image-to-video)
  4. CapCut — connecting shots, subtitles, color grading, and exporting

2. STEP 1 — Making the Music with Suno

First, enter a "style" and "lyrics" into Suno's custom mode to generate the song. For the style, put in the genre, mood, and BPM; for the lyrics, add structure tags like [Verse] and [Chorus]. For example, you might write the following.

[Style] dreamy summer citypop, airy female vocal, warm synths, 92 BPM
[Lyrics]
[Verse] ...
[Chorus] ...

For Reels, keep it short at 35–45 seconds, and to make captioning and editing easier, get a separate instrumental version as well. Generate several songs and pick the one with the most addictive hook.

3. STEP 2 — Creating Characters and Shots with GPT Image

This is where 90% of your video's consistency is decided. Lock in a single character image (front view) first, and then build every subsequent shot based on that image so the character doesn't drift. Write the character description exactly the same way every time.

front-facing full body, [hair / outfit / expression] fixed, simple background, authentic Japanese anime 2D cel style,
not overly idealized, no text, vertical 9:16

Attach your finalized character image to a new prompt and generate each shot with "this exact character, but this time [scene]." Generate 2–3 images per shot and pick the best one.

4. STEP 3 — Animating with Kling

Feed the still images into Kling's image-to-video to make a 5-second clip per shot. For motion, keep camera moves (push-in, pan, tilt) and character actions concise, and add phrasing to prevent morphing (faces smearing).

[Motion] slow push-in, hair sways in the breeze, subtle, cinematic
[Negative] morphing, distortion, deformed face, flickering

If faces changing is a concern, register the front-view image in Kling's element (character) registration so the same face is maintained across shots. When you want to connect shots smoothly, use the last frame of the previous shot as the start of the next.

5. STEP 4 — Editing and Finishing in CapCut

Finally, load the video clips and music into CapCut and weave them into a single piece. Turn on beat detection and sync your cuts to the beat to bring out the rhythm, and place your climax shots on the chorus for maximum impact.

Finish in the following order.

  1. Apply the same color-grading filter to every clip (unifying the tone—the key thing that separates amateur from pro)
  2. Keep subtitles to a minimum (one line for the hook if needed) and lock the watermark in place
  3. Match the end to the beginning for an infinite loop (boosts replay rate)
  4. Export at 1080×1920 (9:16), 30fps or higher

6. Going One Step Further

The key to raising your production quality isn't tricks—it's consistency. Unify the character, the color grading, and the world-building, and even with different shots it will look like a single cohesive work, which is what creates the impression that it was "well made." Posting the same video to Reels, TikTok, and Shorts together can multiply your reach several times over.

It doesn't have to be perfect from the start. Once you take a short 8-shot loop all the way to completion, the next one will go much faster.


References: Suno · Kling AI · Claude Code Official Docs

← All posts