Prompt Engineering Frameworks Compared: CO-STAR, RISEN, RACE, CREATE, APE, and STOKE

Cosmin IaruMarch 17, 202616 min read

prompt frameworksCO-STARRISENRACEprompt engineeringcomparison

Prompt Engineering Frameworks Compared: CO-STAR, RISEN, RACE, CREATE, APE, and STOKE

You know you should structure your prompts. Every guide on how to write better AI prompts says the same thing: be specific, add context, define the output. But "be specific" is vague advice — which is ironic.

Frameworks solve this. They give you a repeatable structure so you don't reinvent the wheel every time you open a chat window. The problem? There are now dozens of them, and most people either pick one at random or ignore them entirely.

This guide compares six of the most widely used prompt engineering frameworks side by side: CO-STAR, RISEN, RACE, CREATE, APE, and STOKE. You'll see what each one does well, where it falls short, and — most importantly — which one to use for the task in front of you.

Why Frameworks Exist (And Why Most People Ignore Them)

The average prompt contains one thing: the task. "Write a blog post about remote work." "Summarize this document." "Generate test cases." That's like handing a contractor a blueprint that says "build a house" with no dimensions, no materials list, and no site plan.

Frameworks exist to fill in everything the AI would otherwise guess at. Context, constraints, audience, format, success criteria — the information that separates a useful output from a generic one.

So why do most people skip them? Three reasons. First, they don't know which framework to use. Second, the frameworks feel like overhead when you just want a quick answer. Third, nobody has shown them the difference it makes on their specific task.

This guide fixes all three. By the end, you'll know exactly which framework fits which situation — and you'll have seen them all applied to the same real task so you can judge the difference yourself.

The 6 Frameworks at a Glance

Framework	Components	Best For	Complexity	Learning Curve
CO-STAR	Context, Objective, Style, Tone, Audience, Response	Content creation, marketing copy	Medium	Low
RISEN	Role, Instructions, Steps, End Goal, Narrowing	Multi-step technical tasks	Medium-High	Medium
RACE	Role, Action, Context, Expect	Quick structured prompts	Low	Very Low
CREATE	Character, Request, Examples, Adjustments, Type, Extras	Detailed creative work	High	Medium
APE	Action, Purpose, Expectation	Simple, fast prompts	Very Low	Very Low
STOKE	Situation, Task, Objective, Knowledge, Examples	Analytical and domain-expert tasks	Medium-High	Medium

Each framework makes tradeoffs. Simpler frameworks (APE, RACE) get you 80% of the value in 20% of the time. More detailed ones (CREATE, STOKE) give you finer control but require more upfront thinking. The right choice depends on your task, not on which framework is "best."

CO-STAR

Context, Objective, Style, Tone, Audience, Response

CO-STAR was popularized through Singapore's GovTech prompt engineering guides and quickly gained traction because it's intuitive and covers the dimensions that matter most for content-oriented tasks. For a deep dive into each component with practical examples, see our complete CO-STAR framework guide.

When to use it: Marketing copy, blog posts, emails, social media content — anything where tone, audience, and style directly shape quality. If your output needs to sound a specific way, CO-STAR is a strong default.

Strengths:

Explicit tone and audience fields prevent the most common content failure (wrong voice for the reader)
The Style component lets you reference specific writing styles or formats
Low overhead — most people can fill in all six fields in under a minute

Weaknesses:

No built-in mechanism for multi-step reasoning or complex logic
Doesn't include a field for examples or reference material
The distinction between Style and Tone can feel redundant for simple tasks

Example prompt:

Context: We're a B2B SaaS company launching a new API monitoring product.
Our existing customers are mid-market engineering teams (50-200 developers).
We've been in private beta for 3 months with 92% satisfaction scores.

Objective: Write an announcement email that drives beta users to upgrade
to the paid tier within the first week of general availability.

Style: Concise, technical, data-driven. Similar to Stripe's product
announcements.

Tone: Confident but not salesy. Respectful of the reader's time.

Audience: Engineering managers and senior developers who evaluated
the product during beta. They already know the basics — don't re-explain
the product.

Response: Email format. Subject line + preview text + body.
Under 300 words for the body. Include one clear CTA button.

CO-STAR works well because it forces you to separate what you're writing from who you're writing for and how it should sound. For content tasks, those distinctions matter more than anything else.

RISEN

Role, Instructions, Steps, End Goal, Narrowing

RISEN shines when your task has a clear sequence of operations. It was designed for scenarios where the AI needs to follow a specific process, not just produce an output in a certain style. Our RISEN framework guide covers each component in detail with worked examples.

When to use it: Multi-step analysis, technical documentation, research tasks, code reviews, process-driven workflows. If your task has a natural order of operations, RISEN structures it explicitly.

Strengths:

The Steps component forces you to define the process, which prevents the AI from taking shortcuts
End Goal keeps the output anchored to a measurable outcome
Narrowing lets you add constraints after defining the broad task, which matches how humans think

Weaknesses:

Can feel heavy for simple tasks — if your task is one step, RISEN adds unnecessary structure
No explicit audience or tone fields, so content quality is less controlled
The Role field overlaps with what most models already infer from context

Example prompt:

Role: You are a senior security engineer conducting a dependency audit.

Instructions: Analyze the provided package.json file and identify
dependencies with known CVEs, outdated major versions, or
unmaintained status (no commits in 12+ months).

Steps:
1. List all direct dependencies with their current version
2. Check each against known CVE databases (flag severity: critical, high, medium)
3. Identify packages more than 2 major versions behind latest
4. Flag packages with no GitHub activity in the last 12 months
5. Suggest replacement packages where alternatives exist

End Goal: Produce a prioritized remediation plan that an engineering
team can execute in a single sprint, starting with critical security risks.

Narrowing: Focus only on direct dependencies (not devDependencies).
Ignore packages with only low-severity CVEs. Format output as a
markdown table with columns: Package | Issue | Severity | Action.

RISEN's strength is the Steps component. By defining the sequence explicitly, you prevent the AI from skipping steps or reordering them in ways that lose important information. This is conceptually related to chain of thought prompting — you're externalizing the reasoning process.

RACE

Role, Action, Context, Expect

RACE is the minimalist option. Four components, no overhead. It's the framework you reach for when you need a structured prompt in 30 seconds. For a full walkthrough with templates, see our RACE framework prompting guide.

When to use it: Quick tasks, daily workflows, situations where speed matters more than fine-tuning. RACE is also a good starting point when you're not sure how complex your prompt needs to be — start with RACE, then upgrade to a heavier framework if the output isn't good enough.

Strengths:

Fastest to write — four fields, each one sentence
Easy to remember and teach to non-technical teammates
The Expect field forces you to define what "good" looks like, which alone improves output quality by a significant margin

Weaknesses:

No style, tone, or audience control — you're relying on the AI to infer these
No mechanism for multi-step processes
Can produce inconsistent results on complex tasks because there isn't enough structure to constrain the model

Example prompt:

Role: Act as an experienced technical writer.

Action: Rewrite the following error message to be user-friendly
and actionable: "Error 5023: NULL_REFERENCE_EXCEPTION in
module auth.handler at line 247"

Context: This error appears in a consumer mobile app when users
try to log in after a session timeout. Users are non-technical.
Average age is 35-55. They need to know what happened and
what to do next — not what went wrong technically.

Expect: A short error message (under 20 words) plus a one-sentence
recovery instruction. Friendly tone, no technical jargon.

RACE is underrated. For 70% of daily prompting tasks, four well-chosen sentences outperform a vague paragraph. If you're new to frameworks, start here and learn what's in your prompt engineering glossary along the way.

CREATE

Character, Request, Examples, Adjustments, Type, Extras

CREATE is the most detailed of the six frameworks. It includes an explicit Examples component — making it the only framework here with built-in few-shot prompting support.

When to use it: Tasks where output format, voice, or structure must match a specific reference. Brand copywriting, documentation following a style guide, code generation matching existing patterns, translations preserving specific nuances.

Strengths:

The Examples field is powerful — showing the AI what good output looks like is consistently the highest-leverage prompting technique
Adjustments let you fine-tune after defining the broad task, catching edge cases
Type explicitly specifies the output format, reducing ambiguity

Weaknesses:

Requires the most effort to set up — finding or writing good examples takes time
Can feel over-engineered for simple tasks
Six components with overlapping scope (Adjustments vs. Extras can be confusing)

Example prompt:

Character: You are a senior product copywriter at a developer tools company.
You write in a style that's technical but warm — like Vercel's blog posts.

Request: Write a changelog entry for our new real-time collaboration feature.

Examples:
Good: "Workspaces now sync in real-time. Every edit, comment, and cursor
movement appears instantly for all team members — no refresh needed.
Works across all plan tiers."

Bad: "We're excited to announce our amazing new collaboration feature!
This game-changing update will revolutionize how your team works together!"

Adjustments: Keep it under 80 words. Lead with the capability, not
the emotion. Include one specific technical detail (WebSocket-based,
sub-100ms latency). No exclamation marks.

Type: Changelog entry formatted as: feature name (H3), description
paragraph, one bullet list of technical specs (3 items max).

Extras: This is for our public changelog at /changelog. The audience
is developers who are already paying customers. They care about
what changed and whether it affects their workflow.

CREATE's built-in Examples component is its biggest differentiator. Research consistently shows that few-shot prompting — giving the AI examples of desired output — produces more consistent results than any amount of instruction alone.

APE

Action, Purpose, Expectation

APE is the lightest framework on this list. Three components. It works because it forces you to answer the three most important questions: What should the AI do? Why? And what should the result look like?

When to use it: Rapid iteration, brainstorming, simple one-shot tasks, situations where you'll review and refine the output anyway. APE is also useful as a mental model — even when you use a heavier framework, the APE questions are always the foundation.

Strengths:

Almost zero overhead — faster than writing a freeform prompt in most cases
Purpose field is unique among lightweight frameworks and prevents aimless outputs
Easy to chain multiple APE prompts for multi-step workflows

Weaknesses:

No role, audience, or context — the AI fills in blanks with defaults
Not enough structure for complex, nuanced, or high-stakes tasks
No example mechanism, so format consistency is harder to guarantee

Example prompt:

Action: Generate 10 subject lines for a product launch email.

Purpose: We're announcing a new AI code review tool to our existing
user base of 5,000 developers. The email needs a high open rate
because the launch discount expires in 48 hours.

Expectation: Each subject line should be under 50 characters, use
no emojis, and take a different angle (curiosity, urgency, benefit,
social proof, question). Label each with its angle.

APE is the gateway framework. If you're not using any structure in your prompts today, starting with APE will produce an immediate quality improvement. Once you outgrow it, move to RACE or CO-STAR.

STOKE

Situation, Task, Objective, Knowledge, Examples

STOKE was built specifically to address the gap between "what you want the AI to do" and "what the AI needs to know to do it well." The Knowledge component is what sets it apart — it's where you inject domain expertise, constraints, and specialized information that the AI wouldn't have otherwise.

For a deep dive, see the full STOKE framework guide.

When to use it: Domain-expert tasks, technical writing, analysis where accuracy depends on specialized knowledge, any task where the AI's default training data isn't sufficient.

Strengths:

Knowledge component fills the biggest gap in most prompts — domain context
Objective separates "what to do" (Task) from "what to achieve" (Objective), preventing drifting outputs
Examples component provides few-shot learning, similar to CREATE

Weaknesses:

Requires the user to have domain knowledge to fill in the K component
Overlaps with CREATE on examples and with CO-STAR on context
Higher setup time than RACE or APE

Example prompt:

Situation: I'm a fintech startup's head of compliance. We're preparing
for a SOC 2 Type II audit next quarter. Our engineering team has been
focused on features and hasn't documented security controls consistently.

Task: Create a gap analysis template that maps our current engineering
practices to SOC 2 Trust Services Criteria.

Objective: Identify the top 10 gaps that would cause audit findings,
so we can prioritize remediation in the next 8 weeks.

Knowledge:
- SOC 2 Trust Services Criteria: Security, Availability, Processing
  Integrity, Confidentiality, Privacy
- Our stack: AWS (EKS, RDS, S3), GitHub Actions CI/CD, Terraform IaC
- We have: MFA enforced, encrypted RDS, VPC isolation, basic CloudWatch
- We lack: formal change management, documented incident response,
  vendor risk assessments, data retention policies
- Auditor will be from a Big 4 firm, expect formal documentation

Examples:
Gap format: "[TSC Category] — [Specific Control] — [Current State] —
[Required State] — [Remediation Effort: S/M/L]"

Example: "Security — CC6.1 Logical Access — MFA enabled but no
quarterly access reviews — Need documented quarterly reviews with
evidence trail — M"

STOKE's Objective component deserves emphasis. In CO-STAR, the Objective is "what you want the AI to produce." In STOKE, the Objective is "what the output needs to achieve in the real world." That distinction pushes the AI toward outputs that are useful, not just correct.

Head-to-Head: Same Task, 6 Frameworks

To see the real differences, let's apply all six frameworks to the same task: Write a product announcement email for a B2B SaaS launch.

CO-STAR Version

Context: DataSync Pro is a new data integration platform launching March 15.
We've had 200 beta users for 4 months. Target market: mid-market companies
with 3-10 data sources that currently use manual ETL scripts.

Objective: Write an announcement email that drives recipients to start
a free trial within the first week of launch.

Style: Clean, benefit-driven, inspired by Linear's product emails.
Short paragraphs, clear hierarchy.

Tone: Professional and direct. Quietly confident — let the product
speak through specific capabilities, not adjectives.

Audience: CTOs, data engineering leads, and senior developers at
companies with 50-500 employees. They've tried other tools and found
them either too complex or too limited.

Response: Subject line + preview text + email body (under 250 words).
One primary CTA button, one secondary text link.

RISEN Version

Role: You are a B2B SaaS product marketing manager with 10 years
of experience writing launch emails for developer tools.

Instructions: Write a product announcement email for DataSync Pro,
a data integration platform launching March 15 after 4 months of beta.

Steps:
1. Open with the core problem (manual ETL is slow and error-prone)
2. Introduce DataSync Pro as the solution in one sentence
3. List 3 key differentiators (visual pipeline builder, 50+ connectors,
   sub-minute sync times)
4. Include one beta user proof point (200 beta users, 94% would recommend)
5. Close with clear CTA to start free trial

End Goal: An email that achieves >25% open rate and >5% click-through
rate among a list of 3,000 qualified mid-market engineering leads.

Narrowing: Under 250 words body. No jargon the reader needs to look up.
No pricing in this email — the goal is trial starts, not purchases.

RACE Version

Role: Act as a SaaS product marketing expert.

Action: Write a product launch announcement email for DataSync Pro,
a data integration platform targeting mid-market engineering teams.

Context: Launching March 15 after 4-month beta with 200 users.
Key features: visual pipeline builder, 50+ connectors, sub-minute
sync. Audience: CTOs and data engineers at 50-500 employee companies.

Expect: Subject line + email body under 250 words. Benefit-led,
professional tone. One clear CTA to start a free trial.

CREATE Version

Character: You are the head of product marketing at a B2B SaaS company.
You write emails that developers actually read — technical enough to be
credible, concise enough to respect their time.

Request: Write the launch announcement email for DataSync Pro, our data
integration platform going live March 15.

Examples:
Good subject: "DataSync Pro is live — connect your data sources in minutes"
Bad subject: "Exciting news! Our amazing new product is here!!!"

Good opening: "Manual ETL scripts break at 3 AM. DataSync Pro replaces
them with visual pipelines that sync in under 60 seconds."
Bad opening: "We're thrilled to announce the launch of our revolutionary
new data integration platform!"

Adjustments: Under 250 words body. Lead with the problem, not the product.
Include one specific metric from beta (200 users, 94% recommend). No
exclamation marks in the body.

Type: Marketing email with subject line, preview text, body, and CTA button.

Extras: Sending to 3,000 mid-market engineering leads. Many have evaluated
competing tools (Fivetran, Airbyte) and found them too expensive or complex.

APE Version

Action: Write a product launch announcement email for DataSync Pro,
a B2B data integration platform.

Purpose: Drive free trial signups from mid-market engineering leads
in the first week after our March 15 launch. We had 200 beta users
with 94% satisfaction.

Expectation: Subject line + body under 250 words. Professional,
benefit-led tone. One CTA to start a free trial. No hype language.

STOKE Version

Situation: We're launching DataSync Pro on March 15 — a data integration
platform for mid-market engineering teams. We've run a 4-month beta with
200 users (94% would recommend). Sending to 3,000 qualified leads who
currently use manual ETL or have evaluated Fivetran/Airbyte.

Task: Write the product launch announcement email. Subject line, preview
text, and body under 250 words with one primary CTA.

Objective: Drive 150+ free trial signups in the first week. Secondary
goal: establish DataSync Pro's positioning as "the mid-market alternative"
to enterprise-priced competitors.

Knowledge:
- Key differentiators: visual pipeline builder (no code), 50+ connectors,
  sub-minute sync, 70% cheaper than Fivetran at mid-market scale
- Audience pain: manual ETL breaks overnight, existing tools are priced
  for enterprise, setup takes weeks not hours
- Beta proof: average setup time 23 minutes, 3.2x faster than previous
  tools, zero data loss incidents in 4 months
- Competitor weakness: Fivetran's mid-market pricing pushes $2k+/month

Examples:
Tone reference: Linear's product announcements — clean, specific,
no filler. One feature per sentence. Metrics over adjectives.

What stands out: CO-STAR and CREATE produce the most controlled tone. RISEN gives the AI the clearest process. STOKE provides the deepest domain context. RACE and APE are fastest to write but leave more to the AI's discretion.

The Decision Matrix: Which Framework When

Task Type	Simple/Quick	Medium Complexity	High Complexity
Content & copy	APE	CO-STAR	CREATE
Technical/analytical	RACE	RISEN	STOKE
Creative/brainstorming	APE	CO-STAR	CREATE
Process-driven	RACE	RISEN	RISEN
Domain-expert	RACE	STOKE	STOKE
Code generation	APE	RISEN	STOKE
Data analysis	RACE	RISEN	STOKE

Rules of thumb:

Need tone/audience control? CO-STAR or CREATE
Need to define a process? RISEN
Need domain expertise injected? STOKE
Need few-shot examples? CREATE or STOKE
Need it fast? APE or RACE
Not sure? Start with RACE. Upgrade if the output isn't specific enough.

Can You Combine Frameworks?

Yes — and experienced prompt engineers do this regularly. Frameworks are mental models, not rigid templates. The most effective approach is often to cherry-pick components from different frameworks.

Common combinations that work well:

CO-STAR + Examples (borrow from STOKE/CREATE): Add a few-shot examples section to your CO-STAR prompt for content tasks that need specific format consistency.
RISEN + Knowledge (borrow from STOKE): Add domain knowledge to a RISEN prompt when your multi-step process requires specialized context.
RACE + Steps (borrow from RISEN): When a task is simple enough for RACE but has a clear sequence, add a Steps component.

The goal isn't framework purity — it's information completeness. If your prompt gives the AI everything it needs to produce exactly the output you want, the framework did its job. If you find yourself reaching for components from other frameworks, that's a signal you've outgrown the one you started with.

The same logic applies when you combine frameworks with techniques like chain of thought or prompt chaining — the framework structures the input, the technique structures the reasoning.

Framework Performance Across Models

Frameworks are model-agnostic by design, but models respond to them differently in practice. Here's what we've observed testing these frameworks across GPT-4o, Claude, and Gemini:

GPT-4o follows framework structure faithfully. It's particularly strong with RISEN's step-by-step process and CO-STAR's tone/style instructions. It sometimes over-indexes on the Role component, producing unnecessarily formal outputs if you assign an expert role.

Claude excels with STOKE's Knowledge component — it integrates domain context naturally rather than treating it as a separate section. Claude also handles CREATE's Examples component well, closely mirroring the style and structure of provided examples.

Gemini performs best with explicit constraints. The Narrowing component in RISEN and the Adjustments in CREATE help keep Gemini's outputs focused. It can be more verbose than the other models, so include word count limits regardless of framework.

Across all models, the components that produce the most consistent improvement are: concrete examples (from CREATE or STOKE), explicit output format (from CO-STAR's Response or CREATE's Type), and clear constraints (from RISEN's Narrowing). If you only add one thing to your freeform prompts, add an example of what good output looks like.

Try It Yourself

The best way to learn frameworks is to apply them. Take a prompt you used this week — one that produced mediocre results — and rewrite it using CO-STAR, RISEN, or STOKE. Compare the outputs.

If you want to skip the manual structuring, Promplify lets you select a framework and have your prompt automatically restructured using real LLM optimization. It supports all six frameworks covered here (plus nine more), so you can test how different frameworks affect your specific prompts across GPT-4o, Claude, Gemini, and DeepSeek.

The frameworks in this guide are starting points. The more you use them, the more you'll develop intuition for which components matter most for your specific work — and you'll stop needing the acronyms entirely.

Frequently Asked Questions

What is the best prompt engineering framework?

There is no single best framework. CO-STAR is strongest for content and marketing tasks where tone and audience control matter. RISEN excels at multi-step technical processes. STOKE is best when domain expertise needs to be injected. For quick tasks, APE or RACE provide 80% of the value with minimal effort. Choose based on your task type and complexity, not on which framework is most popular.

Do prompt engineering frameworks work with all AI models?

Yes. Frameworks like CO-STAR, RISEN, RACE, CREATE, APE, and STOKE work with GPT-4o, Claude, Gemini, and other major LLMs. The underlying principle — giving the AI structured, complete information — is model-agnostic. However, models respond slightly differently to specific components. Claude handles domain knowledge (STOKE's K component) particularly well, while GPT-4o follows step-by-step processes (RISEN's Steps) most faithfully.

Can I combine multiple prompt engineering frameworks?

Absolutely. Experienced prompt engineers routinely borrow components from different frameworks. Common effective combinations include CO-STAR with added few-shot examples (from STOKE or CREATE), RISEN with a Knowledge section (from STOKE), and RACE with explicit Steps (from RISEN). The goal is information completeness, not framework purity.

What is the difference between CO-STAR and STOKE frameworks?

CO-STAR (Context, Objective, Style, Tone, Audience, Response) emphasizes how the output should sound — with explicit Style, Tone, and Audience components. STOKE (Situation, Task, Objective, Knowledge, Examples) emphasizes what the AI needs to know — with dedicated Knowledge and Examples components. Use CO-STAR when voice and audience targeting matter most (marketing, content). Use STOKE when domain expertise and output accuracy matter most (analysis, technical writing).

How do I choose a prompt framework for my task?

Start with task complexity. For simple, quick tasks, use APE (3 components) or RACE (4 components). For medium-complexity content tasks, use CO-STAR. For multi-step technical tasks, use RISEN. For tasks requiring domain expertise or few-shot examples, use STOKE or CREATE. If you're unsure, start with RACE and upgrade to a more detailed framework if the output isn't specific enough.

Ready to Optimize Your Prompts?

Try Promplify free — paste any prompt and get an AI-rewritten, framework-optimized version in seconds.

Start Optimizing

Prompt Engineering Frameworks Compared: CO-STAR, RISEN, RACE, CREATE, APE, and STOKE

Why Frameworks Exist (And Why Most People Ignore Them)

The 6 Frameworks at a Glance

CO-STAR

RISEN

RACE

CREATE

APE

STOKE

Head-to-Head: Same Task, 6 Frameworks

CO-STAR Version

RISEN Version

RACE Version

CREATE Version

APE Version

STOKE Version

The Decision Matrix: Which Framework When

Can You Combine Frameworks?

Framework Performance Across Models

Try It Yourself

Frequently Asked Questions

What is the best prompt engineering framework?

Do prompt engineering frameworks work with all AI models?

Can I combine multiple prompt engineering frameworks?

What is the difference between CO-STAR and STOKE frameworks?

How do I choose a prompt framework for my task?

Ready to Optimize Your Prompts?

Cookie Settings