← All ResourcesBlog

What 1M-View Scripts Actually Say: A Linguistic Teardown of 165 Viral Videos

We read every word of 3,633 transcripts and split them by view tier. Mega-viral scripts use longer sentences, ask 60% more questions, and swear half as much as long-tail content. Plus a first-word leaderboard: the openers most likely to break 100K (and the ones that bury you).

April 28, 2026·Updated April 28, 2026·15 min read
Blog

What 1M-View Scripts Actually Say: A Linguistic Teardown of 165 Viral Videos

9,326

Hooks analyzed (first-word study)

3,633

Full transcripts (script-level study)

6.7%

Of mega-virals contain profanity (vs 12.2% long tail)

14.0

Words per sentence in mega-virals (vs 10.2 long tail)

Most "how to write a viral script" advice is folk wisdom. "Be punchy." "Use second-person." "Be controversial." "Stop with the long sentences." Almost none of it has been tested against actual viral content.

We ran two parallel analyses across the corpus. For the first-word study we used 9,326 videos with substantial opening text (transcript or hook-text combined). For the script-level study (sentence length, questions, profanity, pronouns) we used the 3,633 videos with full transcripts populated, since those metrics genuinely require the full script. Then we ran the measurements across three view tiers (under 100K, 100K-1M, 1M+) to see what actually changes as scripts scale into the viral tier.

The patterns are not what the advice columns say.

The three measurable differences

Mega-viral scripts have longer sentences, ask more questions, and swear less than long-tail scripts. Words per sentence: 10.2 → 14.0 (+37%). Questions per video: 1.16 → 1.84 (+58%). Profanity rate: 12.2% → 6.7% (cut in half). The viral tier sounds more like a thoughtful explainer than a hot-takes podcast.

Here's the full teardown, side by side.

Long-tail script DNA (under 100K)
  • Words per sentence10.2

    Punchy fragments

  • Questions per script1.16

    Mostly statements

  • Profanity rate12.2%

    1 in 8 scripts swears

  • You-to-I ratio2.40x

    Heavy direct address

Mega-viral script DNA (1M+)
  • Words per sentence14.0

    +37% longer, more declarative

  • Questions per script1.84

    +58% more curiosity loops

  • Profanity rate6.7%

    Half as much swearing

  • You-to-I ratio1.91x

    More balanced pronouns

Each of these patterns is real, measurable, and points the same direction. Long-tail scripts are short, punchy, profanity-tolerant, and "you-heavy." Mega-viral scripts are longer, more curious, cleaner, and more balanced.


Finding 1: Sentence length grows with virality

We split each transcript by punctuation and counted words per sentence. The pattern across tiers is clear: as scripts scale up, sentences get longer.

Words per sentence

+37% in mega-virals

Under 100K

10.2

100K - 1M

10.3

1M+

highest

14.0

The "speak in fragments" / "punchy = viral" advice is wrong as you scale. Mega-viral content reads more like an explainer. A script that opens with "80% of autoimmune diseases happen to women, which are diseases where the immune system attacks the body that it's supposed to protect, why?" works better than one that opens with "Autoimmune disease. 80% women. Wild."

Why: longer sentences carry more information density per second of attention. Once your content is in the algorithm's amplification layer (past 100K views), the audience is broader and less primed for staccato delivery. Complete thoughts travel further than fragments.

The Tier 2 sentence length (10.3) is basically tied with Tier 1 (10.2). The shift only happens at the very top tier. So the pattern isn't "post longer sentences and grow"; it's "the videos that break out of 1M tend to be written more completely."


Finding 2: Viral scripts ask 58% more questions

Question marks per transcript:

Questions per script

+58% in mega-virals

Under 100K

1.16

100K - 1M

1.68

1M+

highest

1.84

This one tracks with the Investigator-hook finding from our previous study. The "wait, what?" framing scales. Long-tail content makes statements. Viral content asks questions and then answers them, which keeps the viewer in a curiosity loop for the full runtime.

The 28M-view autoimmune video does this twice in its first 10 seconds: "80% of autoimmune diseases happen to women. Why?" Two questions stacked, both answered later in the video. Long-tail health content tends to lead with the answer ("Here's why women get autoimmune diseases more often"), which kills the curiosity gap.


Finding 3: Mega-virals swear half as much

The most counterintuitive finding in the study.

% of scripts with profanity

cut nearly in half at the top

Under 100K

12.2%

100K - 1M

12.5%

1M+

lowest

6.7%

12.2% of long-tail scripts contain at least one instance of fuck/shit/damn/bitch/etc. 12.5% of solid hits do. Then it drops to 6.7% of mega-virals. The "controversial language is viral fuel" theory does not survive the data.

A few likely reasons:

01

Profane content gets shadow-throttled or de-monetized on every major platform. Mega-viral videos clear that filter by default.

02

The audience for a 100K video is your niche. The audience for a 10M video is everyone. Profanity narrows the audience.

03

The 'controversial = viral' creators are over-indexed in the long tail because their content polarizes early. Polarization caps reach.

04

Brands and big accounts (which dominate the mega-viral tier) stay clean for sponsorship reasons. Their reach reflects that discipline.

If you've been swearing in your scripts as a deliberate "edge" play, the data suggests it's costing you reach. Save the profanity for places where the audience self-selects in (podcasts, Substacks, niche channels). On the open feed, clean wins.


Finding 4: The you-to-I ratio collapses (counterintuitively)

You'd expect viral content to use "you" more (talk-to-the-camera, parasocial, second-person engagement). The data flips that intuition.

MetricT1 (long tail)T2 (100K-1M)T3 (1M+)
Avg "you" / "your" / "you're"8.27.06.9
Avg "I" / "I'm" / "me" / "my"6.86.65.5
You-to-I ratio2.401.811.91

Long-tail scripts hammer "you" (2.4x more than "I"). Mega-virals balance the two more evenly (1.91x). And in absolute terms, mega-virals use BOTH pronouns less than long-tail content.

The interpretation: long-tail content is over-indexed on direct address ("you should...", "you need to...", "you're probably making this mistake"), which reads as personal advice. Mega-viral content spends more time on the subject matter and less on speaker/viewer dynamics. The script is about the thing, not about the relationship between the speaker and the viewer.


The first-word podium

For this analysis we expanded the sample to 9,326 videos by combining transcripts with hook-text where transcripts weren't populated. The first word of every script, extracted, normalized, and ranked by median views (min 50 videos per word).

2

There

33,174

'There's a...' opener forces the reveal

2
1

Hey

39,450

Greeting + parasocial intimacy

1
3

What

32,800

Question word, curiosity hook

3

The bottom shelf — openers to avoid

Okay1,899
Alright4,594
If4,855
It's4,994
I'm6,730

"Hey" beats "Okay" by 20x at the median. A single word at the start of your script is doing that much work.

The full first-word leaderboard, ranked by median views:

First word vs median views (9,326-video sample, min 50 per word)

NicheValue
  1. 01

    Hey

    96 videos · highest median opener

    39.5K
  2. 02

    There

    75 videos

    33.2K
  3. 03

    What

    121 videos · question word

    32.8K
  4. 04

    All

    59 videos · totalizing frame

    29.0K
  5. 05

    Did

    60 videos · question word

    28.0K
  6. 06

    We

    116 videos · inclusive frame

    27.8K
  7. 07

    The

    639 videos · declarative noun start

    25.4K
  8. 08

    This

    696 videos

    23.2K
  9. 09

    Here

    62 videos

    19.9K
  10. 10

    Let

    57 videos

    19.2K
  11. 11

    How

    87 videos · question word

    15.3K
  12. 12

    Why

    81 videos · question word

    14.8K
  13. 13

    When

    60 videos · POV territory

    14.2K
  14. 14

    These

    51 videos

    13.2K
  15. 15

    A

    97 videos

    11.0K
  16. 16

    So

    168 videos · filler

    10.8K
  17. 17

    One

    54 videos

    9.2K
  18. 18

    I

    456 videos · self-referential

    9.0K
  19. 19

    Here's

    68 videos

    8.7K
  20. 20

    You

    192 videos · direct address

    8.1K
  21. 21

    I'm

    104 videos · self-referential

    6.7K
  22. 22

    Do

    65 videos

    5.8K
  23. 23

    It's

    50 videos

    5.0K
  24. 24

    If

    283 videos · conditional

    4.9K
  25. 25

    Alright

    79 videos · filler

    4.6K
  26. 26

    Okay

    72 videos · lowest opener

    1.9K
LowHigh

The "Hey" surprise. "Hey" was the highest-median first word in the entire dataset (61,029 median views). Conventional wisdom says hooks should never start with a casual greeting. The data disagrees. The reason: when a creator opens with "Hey, Dr. Tim here..." or "Hey, what's up bro?", they're establishing parasocial intimacy fast, which works for the algorithms that reward retention.

Question words win. "What", "Did", and "Why" all sit in the top 10. Starting your video with a question is one of the most consistent positive signals in the data.

Self-referential openers lose. "I" (10,803 median), "I'm" (5,054), "My" (6,389). Starting with yourself is a structural mistake. The viewer doesn't care who you are yet. They care about whether the next 30 seconds is worth their attention.

"Stop" is dead. 3,147 median views. The "Stop drinking 8 glasses of water" / "Stop doing X" framing got beat into the ground by 2024-2025 productivity content. The audience tunes out the second they see it.

"Alright" is filler. 3,955 median. Verbal warm-ups translate to dead time on a video. Get to the point.


The Content Labs

See which words your hooks are actually opening with.

Connect TikTok or Instagram. We tag every word of every video on your account, classify by opener, sentence length, and tone, then write a 30-day calendar built on the linguistic patterns that are crossing 100K in your niche.

47,598 creators·No credit card required·60 seconds


What 1M-view scripts actually look like

Real openings from the 165 mega-virals in our dataset:

Hey, Dr. Tim here, and although this might sound like a good idea, it could lead to something called phytophotodermatitis, a skin reaction that occurs when lemons, limes, or other plants...

@(Health expert) · 69.8M views

Opens with "Hey", introduces the speaker by credential, then immediately pivots to a specific named medical condition. 14-word first sentence. Zero filler.

How to answer one of the trickiest job interview questions, what are your salary expectations? Great, so we just have one last question for you here...

@(Career creator) · 50.7M views

Opens with "How to", embeds the question inside the hook, then sets up a roleplay scenario. The whole opener is one rhetorical question + one transition.

80% of autoimmune disease, which are diseases where the immune system attacks the body that it's supposed to protect, 80% of them happen to women. Why?

@(Health expert) · 28.1M views

A 22-word first sentence with a definition embedded mid-sentence, ending in a one-word question. This is the "long, declarative, question-tagged" pattern in pure form.

Um, someone never told me that when you look at someone on LinkedIn, it tells... Oh my gosh, I'm helping you change this right now.

@(Tech tip) · 30.7M views

Casual opener ("Um"), pivots immediately to a specific platform-action insight, then drops the implied promise ("I'm helping you change this"). Notice the 15+ word run-on first sentence.

The common thread: complete sentences, specific subject matter, a question or implied stake within the first 10 words, and very little time spent on the speaker.


How to apply this

01

Stop opening with 'Stop'. Median views for 'Stop'-led hooks: 3,147. Median views for 'Hey'-led hooks: 61,029. That's a 19x gap on a single word.

02

Lead with a question word when you can. 'What', 'Did', 'Why' are all top-10 openers. The question structure compounds with the curiosity hook archetype that wins at every tier.

03

Don't open with 'I' or 'I'm'. Self-referential openers underperform. Save the personal context for sentence two or three, after the viewer has a reason to care who you are.

04

Write longer sentences, not shorter ones. The 'punchy = viral' advice is backwards once you cross 100K views. Mega-virals average 14.0 words per sentence vs 10.2 in the long tail. Complete thoughts beat fragments.

05

Ask more questions in your scripts. Mega-virals average 1.84 questions per video. Long-tail averages 1.16. Stack two questions in your first 10 seconds and let the rest of the video answer them.

06

Cut profanity unless your niche specifically rewards it. The 6.7% profanity rate in mega-virals is half the long-tail rate. The platform algorithms throttle profane content quietly. Clean scripts travel further.


The bottom line

The "viral script playbook" most creators follow (short fragments, lots of "you", strategic profanity, "Stop doing X" hooks) is a long-tail playbook that produces long-tail results.

The actual viral pattern is the opposite: complete sentences, balanced pronouns, no profanity, opening with a greeting or a question rather than an imperative or a self-introduction. The scripts that scale to 1M+ views read more like a thoughtful explainer than a polarizing hot take.

This isn't a coincidence. The audience at 100K+ views is everyone, not just your niche. Content that travels has to land with strangers. Strangers don't respond to "Stop drinking 8 glasses of water." They respond to "80% of autoimmune diseases happen to women. Why?"

The Content Labs

Get a script playbook tuned to your specific niche.

TCL analyzes every word of every video on your account plus your top competitors, breaks down which linguistic patterns are crossing 100K in your space, and writes 30 scripts in those patterns.

47,598 creators·No credit card required·60 seconds


Methodology

Two samples, two purposes. For the script-level findings (sentence length, questions, profanity, pronouns) we used 3,633 videos with full transcripts populated and at least 5 words of text. Tier breakdown: Tier 1 (under 100K) 2,749 transcripts, Tier 2 (100K-1M) 719, Tier 3 (1M+) 165. For the first-word study we expanded to 9,326 videos by using COALESCE(transcript, hook), the first word of the spoken transcript when available, falling back to the hook field when not.

Why transcripts aren't on every video: Transcripts are populated on roughly 35% of videos in our broader 10,718-video corpus. The other ~65% are videos that the transcription pipeline hasn't covered: sound-off skits, music-only edits, very old archived content, and a handful of transcription failures. Pure visual or sound-driven videos are under-represented in the script-level analysis. The first-word analysis covers 87% of the corpus because we can fall back to hook-text when the transcript is missing.

Linguistic measurement:

  • Words/sentence: total word count divided by sentence count (split on ., !, ?).
  • Questions: count of ? characters per transcript.
  • Profanity: regex match for \m(fuck|shit|damn|crap|bitch|asshole|bullshit)\M (case-insensitive). A transcript is "with profanity" if at least one match occurs.
  • Pronoun counts: regex word-boundary matches for (you|your|yours|youre|y'all) and (i|im|me|my|mine) (case-insensitive).
  • First word: leading non-empty word of the transcript (or hook field, when transcript is missing), stripped of punctuation, lowercased.

Why these metrics: these are the most consistent, automatable cuts available across an MDX-friendly dataset. Semantic patterns (sentiment, named-entity recognition, topic) would require a deeper NLP pipeline and a separate study.

Known limits:

  • The "Hey" finding is partly creator-skewed. A small number of high-performing health and finance creators consistently open with "Hey, Dr./bro..." which inflates the cell. Treat directional, not as a guarantee.
  • Tier 3 sample is 165 transcripts. Cell-level percentages (especially the profanity drop from 12.5% to 6.7%) should be treated as directional, but the size of the gap and the consistent direction across all four metrics give us confidence it's not noise.
  • Cross-posted videos appear once per platform.
  • Transcripts are auto-generated and have minor accuracy issues. Mishears that change profanity counts (e.g., "duck" → "fuck") are rare but possible.