Gemini Prompt Engineering: Master AI Photo Generation
Introduction: The Paradigm Shift in Multimodal Generative Reasoning

The contemporary landscape of artificial intelligence has transitioned from simple pattern recognition to complex generative synthesis, a shift most visibly manifested in the domain of text-to-image generation. Within this rapidly evolving sphere, Google’s Gemini ecosystem—underpinned by the Imagen 3 diffusion architecture—represents a fundamental departure from the “keyword-soup” prompting methodologies that defined early generative models. This report provides an exhaustive, expert-level analysis of the technical dynamics, prompting linguistic structures, and operational constraints of the Gemini image generation suite.
Unlike its predecessors, which often relied on raw statistical associations between text tags and visual noise, the Gemini architecture integrates a sophisticated Large Language Model reasoning layer before the image generation phase. This “reasoning engine” acts as an interpretative intermediary, translating natural language intent into the structured latent space of the diffusion model. This architectural nuance necessitates a complete re-evaluation of prompt engineering strategies. Professionals must now move beyond mere descriptors to construct narrative frameworks, leveraging the model’s deep semantic understanding to control complex variables such as lighting physics, camera optics, and emotional atmosphere.
This analysis draws upon a wide array of technical documentation, community research, and empirical testing data to deconstruct the “Nano Banana” (Gemini 2.5 Flash) and “Nano Banana Pro” (Gemini 3) models. It explores the sociotechnical phenomena surrounding their release, the viral trends they have spawned, and the precise linguistic formulas required to achieve photorealistic, commercial-grade output. Furthermore, it addresses the significant technical limitations inherent in the current interface—specifically regarding aspect ratios and safety guardrails—and provides verified workarounds for professional workflows.
Architectural Taxonomy: The Gemini Image Generation Suite
To master prompt engineering within this ecosystem, one must first understand the distinct models available and their specific capabilities. The nomenclature surrounding these models has been fluid, shaped by both official product branding and organic community adoption of internal codenames.
The “Nano Banana” Phenomenon: Nomenclature and Virality
A peculiar and defining characteristic of the current Gemini era is the widespread use of the moniker “Nano Banana.” This term, which might appear frivolous to outside observers, represents a specific, high-performance iteration of the model that gained traction through anonymous benchmarking platforms.
Research indicates that “Nano Banana” originated as an internal placeholder name used by Google DeepMind during secret public testing on the LMArena leaderboard. During this phase, the model—camouflaged to prevent bias—consistently outperformed competitors, ranking #1 in image editing and generation tasks. When the model was officially released as Gemini 2.5 Flash Image, the community, having bonded with the playful codename, refused to abandon it. Google subsequently embraced this branding, incorporating banana emojis into the official interface and social media communications.
The significance of this goes beyond trivia; it signifies the model’s architecture. “Nano Banana” (Gemini 2.5 Flash) is optimized for low latency and high throughput, making it the preferred engine for rapid iteration, social media content, and viral trends. In contrast, “Nano Banana Pro” (Gemini 3 Pro Image) incorporates a “Thinking” process—a chain-of-thought reasoning step that occurs before pixel generation begins—allowing for higher fidelity, complex instruction following, and superior text rendering.
| Feature | Gemini 2.5 Flash Image (“Nano Banana”) | Gemini 3 Pro Image (“Nano Banana Pro”) |
|---|---|---|
| Core Architecture | Natively multimodal, optimized for speed | Advanced reasoning layer + Imagen 3 backbone |
| Processing Latency | Low (Real-time capable) | High (Includes “Thinking” phase) |
| Reasoning Capability | Direct prompt-to-image mapping | Chain-of-Thought planning before generation |
| Context Window | Standard multimodal context | Enhanced world knowledge & 1M+ token context |
| Input Modality | Text, Single Image | Text, Multi-Image (up to 14 reference images) |
| Text Rendering | Basic legible text | State-of-the-art, multi-language support |
| Primary Use Case | Memes, rapid concepting, social media | Infographics, commercial design, photorealism |
The Reasoning Engine and “Thinking” Mode
The introduction of Gemini 3 Pro’s “Thinking” mode marks a critical evolution in prompt engineering. In traditional diffusion models, the prompt is tokenized and directly influences the noise prediction process. In Gemini 3 Pro, the LLM first “ponders” the request. Snippets reveal that this model can plan layouts for infographics, check for spelling errors in requested text, and logically deduce the physical properties of objects before rendering them.
For the prompt engineer, this means that prompts can be structured as logic problems or creative briefs. Instead of describing the visual output immediately, a user can provide a goal—”Create a marketing image that appeals to Gen Z gamers”—and the reasoning engine will infer the necessary visual elements (neon lighting, retro consoles, energy drinks) to fulfill that strategic goal. This capability reduces the need for exhaustive visual description if the intent is clearly articulated, shifting the skill set from visual description to strategic communication.
Integration with the Google Ecosystem
The Gemini image generation capability is not an island; it is deeply integrated into the Vertex AI and Google Workspace ecosystems. This integration allows for “Grounding with Google Search,” where the model can verify facts or retrieve up-to-date visual data (like current weather maps or stock charts) to inform the generation. This contrasts sharply with models trained on static datasets with cutoff dates. For professional users, this implies that prompts can reference contemporary events or dynamic data points, provided the “Grounding” feature is active.
The Linguistics of Prompting: From Keywords to Narrative
The transition to Gemini requires unlearning the “tag-soup” syntax popularized by early Stable Diffusion and Midjourney versions. The model’s core strength is its deep language understanding, which favors narrative, descriptive paragraphs over disjointed lists of keywords.
The Narrative Description Paradigm
Research explicitly states that a “narrative, descriptive paragraph will almost always produce a better, more coherent image than a simple list of disconnected words”. The model parses syntax to understand relationships between objects.
Comparative Analysis of Prompt Structures:
- Ineffective (Legacy Syntax): Astronaut, mars, cafe, futuristic, 8k, unreal engine 5, glowing.
- Critique: This relies on the model’s internal weights for these individual tokens but fails to establish the relationship between them. Is the astronaut building the cafe? Is the cafe on Mars?
- Effective (Gemini Narrative Syntax): Generate a cinematic wide shot of a futuristic cafe located on the surface of Mars. A stoic astronaut, wearing a weathered space suit with glowing blue optics, is acting as a barista behind the counter. The scene is illuminated by the harsh, reddish light of the Martian sun entering through large panoramic windows. Steam rises from a cup of coffee, contrasting with the dusty exterior.
- Analysis: This prompt utilizes the LLM’s understanding of prepositions (“on the surface,” “behind the counter”) and adjectives (“weathered,” “harsh”) to construct a coherent 3D scene. The narrative flow guides the diffusion process to place elements logically rather than randomly.
The “Triangle of Precision”: Subject, Context, Style
To ensure reproducibility and control, professional prompts within the Gemini ecosystem should structurally address three core pillars, often referred to as the “Triangle of Precision”.
Subject Definition
The subject is the focal point of the generation. Specificity here is paramount to avoid generic outputs. Instead of asking for “a dog,” the prompt should specify “a fluffy calico cat wearing a tiny wizard hat” or “a stoic robot barista”. The inclusion of adjectives regarding texture, emotion, and attire helps the reasoning engine select specific latent clusters rather than averaging all “dog” images in its training data.
Context and Environment
Context provides the “grounding” for the subject. A common failure mode in AI generation is the “floating object” syndrome, where a detailed subject exists in a void. Prompts must specify the setting (“a futuristic cafe on Mars,” “a cluttered alchemist’s library”) and the action taking place (“brewing a cup of coffee,” “casting a magical spell”). This contextual layering allows the model to calculate lighting interactions—how the environment’s light reflects off the subject.
Stylistic Direction
Without explicit stylistic instruction, Gemini tends to default to a generic “digital art” or “stock photo” aesthetic. Users must explicitly define the medium. This can range from “photorealistic” and “35mm film” to “oil painting,” “charcoal sketch,” or “low-poly isometric render”.
- Artistic References: While the model avoids generating copyrighted imagery, it understands broad artistic movements.
- Referencing “Impressionism,” “Cyberpunk,” “Bauhaus,” or “Renaissance” triggers specific color palettes and compositional rules.
- Medium Emulation: Keywords like “oil paint textures,” “watercolor bleed,” or “vector art” instruct the model on how to render edges and gradients.
3.3 The Role of the Prompt Rewriter
A hidden dynamic in Gemini is the automated prompt enhancement or “rewriting” layer. When a user submits a short prompt, Gemini often expands it internally to provide the image generator with more detail. While beneficial for novices, this can be detrimental for professionals seeking precise control, as the rewriter may introduce unwanted elements (e.g., adding a smile to a subject intended to be somber).
To mitigate over-processing, prompts can include directives to be concise. However, unlike some systems that allow users to disable this feature via a toggle, Gemini users on the standard interface must often use “negative constraints” or highly specific descriptive language to override the rewriter’s tendency to add “creative flair.” The API documentation notes a parameter enhancePrompt which can be toggled, but chat users lack this direct control.
4. Photorealistic Simulation and Camera Control
One of the most powerful capabilities of the Gemini/Imagen 3 architecture is its deep understanding of photographic terminology. By invoking specific camera settings, lenses, and film stocks, users can force the model to render images that mimic the physical properties of light and optics, moving beyond “AI-looking” smoothness to genuine photorealism.
4.1 Simulating Lens Physics and Focal Lengths
The model recognizes standard focal lengths and lens characteristics. Using these terms acts as a modifier for the scene’s geometry and depth of field.
- Wide Angle (10mm – 24mm): Used for landscapes, architecture, and dynamic action. Prompts should explicitly mention “wide angle,” “10mm,” or “fisheye” to capture expansive scenes. This often introduces barrel distortion, which can be desirable for artistic effect in street photography.
- Example Prompt: “A photo of the moon, astro photography, wide angle 10mm”.
- Standard (35mm – 50mm): Represents the human eye perspective. This is ideal for environmental portraits and street photography where the context is as important as the subject. It minimizes distortion while maintaining a natural field of view.
- Telephoto/Portrait (85mm – 200mm): Essential for isolating subjects. Keywords like “85mm lens” or “telephoto” trigger background compression and strong bokeh (background blur), visually separating the subject from the environment. This is critical for professional portraiture.
- Macro (60mm – 105mm): For extreme close-ups of insects, flowers, or product textures. The prompt must specify “macro lens,” “extreme close-up,” or “100mm macro” to ensure the focus plane is razor-thin and details like insect eyes or leaf veins are hyper-resolved.
4.2 Lighting Constructs and Atmosphere
Lighting is the defining element of photorealism. Gemini responds to cinematic lighting terminology, allowing users to sculpt the scene’s mood.
- Golden Hour: Soft, warm, directional light, usually low in the sky, creating long shadows and a nostalgic mood. This is a staple for landscape and outdoor portraiture.
- Blue Hour/Twilight: Cool, diffuse light, often used for cityscapes to balance artificial city lights with the natural sky. It creates a mood of melancholy or calm.
- Chiaroscuro/Low Key: High contrast, dramatic lighting with deep shadows and bright highlights. Useful for moody portraits or “Film Noir” aesthetics, emphasizing form over color.
- Volumetric Lighting: Simulates light beams passing through particles (fog, dust, smoke) in the air. This adds depth and atmosphere to a 2D image, often referred to as “God rays”.
- Studio Lighting: Implies a controlled environment. Terms like “softbox,” “rim light,” “three-point lighting,” or “butterfly lighting” are essential for product photography and headshots to ensure even illumination and separation from the background.
4.3 Film Stock and Sensor Simulation
To escape the “plastic” or “waxy” look often associated with AI generation, advanced prompts should specify the recording medium.
- Analog Film: Keywords like “35mm film,” “grain,” “Kodak Portra,” “Fujifilm,” or “Polaroid” introduce noise patterns, color grading, and dynamic range limitations specific to analog photography. This creates a sense of authenticity and nostalgia.
- Example: “A candid 1990s grungy snapshot… strong 35mm film grain, cool color shift, date stamp”.
- High-End Digital: For hyper-clean commercial looks, referencing “Phase One,” “100MP,” “8K resolution,” or “Sony A7R IV” signals the model to prioritize sharpness, edge contrast, and high dynamic range, eliminating noise.
Table 2: Photographic Prompt Modifiers and Visual Effects
| Modifier Category | Keywords | Visual Effect |
|---|---|---|
| Lens | Fisheye, Macro, Telephoto, 85mm, Wide-angle | Alters perspective distortion and depth of field. |
| Aperture | f/1.8, f/22, Bokeh, Shallow depth of field | Controls background blur and focus isolation. |
| Lighting | Golden hour, Rim light, Volumetric, Studio lighting | Determines mood, contrast, and subject visibility. |
| Film/Sensor | 35mm film grain, Polaroid, ISO 800, 8K, HDR | Adds texture, noise, and dynamic range characteristics. |
| Angle | Low angle, Aerial view, Dutch angle, Eye-level | Changes the viewer’s relationship to the subject. |
5. Technical Constraints and Professional Workarounds
While Gemini is a powerful tool, it operates within strict technical and safety constraints. Mastering the ecosystem requires navigating these limitations effectively, often through creative workarounds that are not officially documented in the user manual.
5.1 The Aspect Ratio Conundrum
A persistent and significant issue reported by users is the tendency of the Gemini chat interface (both web and mobile) to default to square (1:1) images, even when prompted for “16:9,” “landscape,” or “widescreen”. While the underlying Imagen model supports various ratios (16:9, 4:3, 3:4, 9:16), the conversational UI often acts as a bottleneck, overriding these instructions or failing to pass the parameter correctly to the backend.
Verified Professional Workarounds:
- The “Blank Canvas” Reference Method: Users have found success by uploading a blank image or a reference image that already possesses the desired aspect ratio (e.g., a 16:9 white rectangle). The prompt should then instruct Gemini to “use this aspect ratio for the generated image” or “generate the scene within the dimensions of the reference image.” The model attempts to preserve the geometry of the input, effectively forcing the output ratio.
- Explicit Parameter Requests (API/ImageFX): While the chat app is inconsistent, using Google’s standalone ImageFX tool or the Vertex AI API allows for explicit selection of aspect ratios via UI dropdowns or code parameters (
aspectRatio: "16:9"). This bypasses the chat interpreter’s limitations entirely and is the recommended workflow for production environments. - The Cropping Strategy: A pragmatic, albeit compromised, approach involves prompting for “wide angle” or “zoomed out” compositions within the default square frame. By ensuring the subject is centrally located with ample “safe space” around them, the user can then crop the image to 16:9 in post-production without losing critical details. Prompts using “panoramic view” can encourage the model to render a horizon line suitable for this cropping.
- Transparent PNG Hack: Advanced users have developed tools to create transparent PNGs with specific ratios. Uploading a transparent 16:9 PNG and asking Gemini to “fill this canvas” can sometimes trigger the correct output dimension, although this method is less stable than the ImageFX route.
5.2 Negative Prompting in a Chat Interface
Standard diffusion interfaces (like Automatic1111 for Stable Diffusion) typically feature a dedicated “Negative Prompt” field where users can list elements to exclude (e.g., “ugly, blurry, text, watermark”). Gemini’s chat interface lacks this specific input field, requiring linguistic workarounds.
Strategies for Exclusion:
- Natural Language Exclusion: Instead of a negative tag list, users must use explicit “without” or “free of” statements. However, this is susceptible to the “pink elephant” paradox, where mentioning the object makes the model more likely to generate it due to attention mechanisms in the transformer.
- Positive Exclusion (The “Clean” Approach): The most effective method is to describe the absence of the trait using positive attributes. Instead of “no blurry background,” prompt for “sharp focus throughout” or “deep depth of field.” Instead of “no bad anatomy,” prompt for “anatomically correct,” “perfect symmetry,” or “high fidelity”.
- System Instructions (Gems): For users with access to custom “Gems” (customized versions of Gemini), defining a system instruction that “always produces clean, high-contrast images without artifacts or distortion” can serve as a persistent negative prompt, effectively baking these constraints into the model’s persona.
5.3 Safety Guardrails and the “Person” Policy
Gemini operates under strict ethical guidelines regarding the generation of people, particularly named real-world individuals, to prevent the creation of deepfakes and misinformation. This policy has evolved significantly following early controversies.
- Named Figures: Prompts requesting celebrities, politicians, or specific public figures are systematically blocked. The model is trained to recognize these entities and refuse generation.
- Historical Accuracy and Diversity: Following the incident where the model generated historically inaccurate images (e.g., diverse Founding Fathers), Google paused and then retuned the model.
The current policy allows for the generation of generic people but attempts to balance diversity with historical context when explicitly prompted. However, users may still encounter refusals if the prompt navigates sensitive cultural or historical topics.
- Workarounds for Character Consistency: To generate a consistent character across multiple images (a “virtual influencer” or story character), users cannot rely on a celebrity name as a seed. Instead, they must define the character’s traits deeply in the first prompt (e.g., “a young woman with short red hair, green eyes, wearing a leather jacket”) and use that session’s context window to place the same character in new scenarios. Snippets suggest using “Nano Banana” (Gemini 2.5 Flash) for consistent character identity across multi-turn conversations, as it is optimized for maintaining subject identity.
6. Genre-Specific Prompting Methodologies
To demonstrate the practical application of these principles, we analyze high-performance prompt structures for distinct visual genres. These templates leverage the specific strengths of the Gemini reasoning engine.
6.1 Commercial Product Photography
The goal in this genre is cleanliness, lighting control, and high resolution. The “Nano Banana Pro” model is particularly adept here due to its enhanced reasoning about object permanence and material properties.
Prompt Template & Analysis:
Generate a high-definition, commercial product shot of. The product is placed on a in a. Lighting is to accentuate the. The background is. 8k resolution, sharp focus, advertising standard.
Mechanism: The use of “advertising standard” or “commercial product shot” acts as a quality booster, triggering specific weights in the model associated with high production value. Specifying “rim light” helps separate the product from the background, a crucial technique in professional photography to create depth.
6.2 Food Photography: The Sensory Approach
Food imagery requires stimulating the viewer’s sensory response (appetite appeal). Key elements are texture, steam, and lighting that enhances freshness.
Prompt Template & Analysis:
A close-up, macro photograph of. The food looks hot and fresh, with visible steam rising. The texture of the [Ingredient] is highlighted, appearing [Adjective, e.g., crispy/juicy]. Lighting is natural window light coming from the side, creating soft highlights on the moist surfaces. Shallow depth of field with the background slightly blurred. Shot on a 100mm macro lens. High resolution, food magazine style.
Mechanism: Terms like “moist,” “steam,” and “fresh” are critical. The “natural window light” instruction prevents the food from looking artificial or plastic—a common failure mode in AI food generation where lighting is often too harsh or uniform. The “100mm macro lens” instruction ensures the correct compression and focus fall-off typical of high-end culinary photography.
6.3 Cinematic and Atmospheric Landscapes
This genre leverages Gemini’s understanding of art styles, atmospheric physics, and compositional rules.
Prompt Template & Analysis:
A cinematic, wide-angle landscape shot of [Location]. The time is, with [Atmospheric Condition, e.g., heavy fog] rolling over the terrain. The composition follows the rule of thirds, with a [Focal Point] in the foreground. The color palette is [Color Grade, e.g., teal and orange, desaturated]. Dramatic lighting, high dynamic range (HDR), detailed textures. Shot on 24mm lens.
Mechanism: Specifying “color palette” (e.g., “Nordic palette,” “teal and orange”) is a powerful way to control the emotional tone of the image. The “rule of thirds” instruction utilizes the model’s training on composition principles, ensuring a balanced image rather than a centered, static snapshot.

6.4 Viral Trends: The “3D Caricature” Phenomenon
A viral trend identified in the research is the “3D figurine” or “Pixar-style” caricature, often associated with the “Nano Banana” model capability. This trend exploded on social media due to the model’s ability to render stylized yet physically plausible textures.
Prompt Template & Analysis:
Create a 3D rendered character in the style of a high-quality collectible vinyl toy. The character is. The material looks like smooth, matte plastic with realistic subsurface scattering. Studio lighting with soft shadows. The background is a solid, vibrant color. Cute, expressive, 4k render.
Mechanism: The virality of this style suggests that Gemini’s model is particularly well-tuned for 3D render aesthetics (resembling Redshift or Octane Render styles). The term “vinyl toy” or “subsurface scattering” (a rendering term for how light penetrates translucent surfaces like skin or plastic) is key to achieving the specific “Nano Banana” look that users seek.
6.5 Macro Photography: Insects and Flora
For scientific or artistic macro photography, precision regarding focus and magnification is essential.
Prompt Template & Analysis:
Extreme close-up macro shot of a [Insect/Flower]. Razor sharp focus on the. The background is a creamy, smooth bokeh in shades of green. Water droplets are visible on the surface, refracting the light. Natural sunlight. 1:1 magnification, high detail.
Mechanism: The phrase “refracting the light” forces the reasoning engine to calculate complex light physics within the water droplets, adding a layer of realism that distinguishes the image from a simple 2D drawing. “Creamy bokeh” is a specific photography term that dictates the quality of the blur, ensuring it is smooth and not noisy.
7. Advanced Workflows: From Text to Image to Edit
Professional users do not stop at the first generation. The Gemini ecosystem is designed for an iterative workflow, moving from broad concepts to refined assets.
7.1 Multi-Turn Refinement and Conversation
Gemini maintains context across the chat session. If the first image is “almost right,” the user can follow up with natural language corrections without restating the entire prompt.
- Turn 1: “Generate a robot barista.”
- Turn 2: “Make it look more retro, like 1950s sci-fi.”
- Turn 3: “Change the background to a busy diner.”
- Analysis: This conversational approach allows for “style drift” or “concept refinement” that is difficult in single-turn systems. The model retains the core subject identity (the robot) while swapping out environmental variables.
7.2 Inpainting and Local Edits
Gemini allows users to highlight a section of a generated image and request specific changes (“Change the cat to a dog” or “Remove the background”). This “inpainting” capability is powered by the model’s ability to understand spatial masks.
- Prompting for Edits: Be specific about the area and the change. Example: “Change the color of the car to red, but keep the reflections consistent.” This relies on the “Instruction-tuned” nature of the model, which is trained to follow editing commands rather than just generation descriptions.
8. Conclusion: The Future of Generative Prompting
The Gemini photo prompting ecosystem represents a maturation of AI image generation. It moves away from the “slot machine” mechanics of early random generation toward a “reasoning-based” creative partner. Success in this environment requires a shift in user behavior: from keyword stuffing to articulate, descriptive storytelling.
The emergence of the “Nano Banana” nomenclature highlights the community’s desire for specific, high-performance tools, while the distinction between “Flash” and “Pro” models offers users a choice between speed and reasoning depth. As the ecosystem evolves with the integration of Veo for video generation and deeper integration into Google Workspace, the ability to prompt across modalities—describing not just a static frame but a temporal sequence or a 3D asset—will become the next frontier in prompt engineering.
For the professional user, the foundational skills detailed in this report—narrative precision, technical vocabulary, constraint management, and iterative refinement—will remain the bedrock of high-level AI creation. The future of prompting is not about finding the perfect “magic word,” but about communicating a vision with the clarity and nuance that the reasoning engine demands.
9. Appendix: High-Performance Prompt Library
To facilitate immediate application, the following library consolidates the most effective prompt structures identified in the research, categorized by genre.
Table 3: Master Prompt Library by Genre
| Genre | Prompt Template | Key Mechanism |
|---|---|---|
| Corporate Headshot | “Transform this selfie into a Fortune 500 CEO portrait. Impeccable business attire, confident posture, modern office background with city skyline. Lighting conveys authority/success. 85mm lens, shallow depth of field.” | Authority encoding: Uses “Fortune 500” and “CEO” to trigger specific attire/posture weights. |
| Cyberpunk Portrait | “Gritty, cyberpunk high-contrast portrait. Subject under flickering neon sign (pink/cyan) on wet pavement. HDR lighting, color bleed, exaggerated highlights. Wide-angle street photography, 8K.” | Atmosphere: “Wet pavement” and “neon” create essential reflective textures. |
| Macro Nature | “Extreme close-up macro shot of a dew drop on a spider web. The refraction in the water drop shows the forest behind it. Razor sharp focus on the drop, creamy bokeh background. Natural morning light.” | Physics simulation: “Refraction” forces calculation of complex light paths. |
| Vintage Film | “Candid 1990s grungy snapshot. Young man in flannel, graffiti wall background. Intentionally slightly blurry, strong 35mm film grain, cool color shift, date stamp in corner. Flash photography style.” | Imperfection: Requesting “blurry” and “grain” creates authenticity. |
| Infographic | “Create a modern, clean infographic explaining. Use a split-screen comparison layout. Left side [Color A], Right side. Flat vector style, clear typography, distinct icons.” |
White background.
Structural Logic:
Maximize text rendering by specifying layout (split-screen).
3D Caricature
A 3D rendered vinyl toy character of. Smooth matte plastic texture, subsurface scattering, soft studio lighting. Solid vibrant background. Cute, expressive, 4k render.
Material Physics:
“Subsurface scattering” creates the “Nano Banana” look.
Source: