Search in MDM

Bulletin

The field of media generation through artificial intelligence (AI) is evolving at a vertiginous pace, and video generation has become one of the most active and competitive borders. In this context, Google has presented I see 2, the evolution of its model I see 1 and its flagship proposal to compete in this emerging space. Developed by Google Deepmind, I see 2 is positioned as a latest generation model designed to produce high quality and realism videos, with the aim of offering an "unprecedented creative control."

The arrival of Veo 2 comes at a time of intense competition, with key players like OpenAI's Sora, Runway, Kling, and others driving innovation at a remarkable pace. Google claims that Veo 2 redefines quality and control in AI-powered video generation, with the potential to significantly transform creative workflows across various industries.

This article takes you on a detailed analysis of Google Veo 2. We examine its availability across different Google platforms, its technical specifications, and key improvements over its predecessor, Veo 1. We also address the current limitations of the model, conducting a comparative analysis with Veo 1 and relevant competitors, including expert and early-stage user reviews, and evaluating Google's approach to security and ethics in its development and deployment. 

Accessing Veo 2: Platforms, Pricing and Availability

Google's launch strategy for Veo 2 is characterized by a gradual and fragmented rollout. It began with private previews for select creators and filmmakers and has been progressively expanding across various Google products and platforms. The key date was the announcement of its availability on April 15, 2025, for Gemini Advanced users.

Currently, there are multiple ways to access Veo 2, each with its own characteristics and limitations:

  • Gemini API / Vertex AI: This is the primary route for developers and enterprise customers looking to integrate Veo 2 into their own applications. It is considered production-ready. Access requires API keys, and for certain advanced features, such as editing or specific camera controls, you may need to be on an authorized user list. Companies like WPP, Agoda, Mondelez, and Poe are already using or testing Veo 2 through Vertex AI.
  • Google AI Studio: Offers an experimental environment for developers to test Veo 2's capabilities. Initial access is usually free, but is subject to very strict usage quotas.
  • VideoFX (Google Labs): This is an experimental tool for creators, accessible through Google Labs. It requires registration on a waiting list. Initially, early access was restricted to users over 18 years of age in the US, although Google plans to expand access.
  • Gemini Advanced: Veo 2 is integrated as a feature for subscribers of the Google One AI premium plan. It allows users to create 8-second videos at 720p resolution, with monthly usage limits that are not explicitly defined (it states that you will be notified when you are approaching the limit). It is available globally in countries and languages ​​where Gemini Apps is supported.
  • Whisk Animate (Google Labs): This experimental feature, also within Google Labs, uses Veo 2 to convert still images into 8-second animated video clips. It is available to Google One AI Premium subscribers in over 60 countries.
  • YouTube Shorts (Dream Screen): Veo 2 integration is being rolled out to YouTube Shorts via the Dream Screen feature. This will allow creators to generate unique video backgrounds using AI or even create standalone video clips from text prompts. The initial rollout will take place in the US, Canada, Australia, and New Zealand.

As for the different prices, they vary significantly between these platforms:

  • API/Vertex AI: The cost is based on the amount of video generated. Sources indicate prices between $0.35 and $0.50 per second. This equates to $21–$30 per minute or $1260–$1800 per hour of generated video. As a launch promotion, Google is offering $300 in free credits, and there may be initial free usage periods with Vertex AI.
  • Subscription: Access via Gemini Advanced and Whisk Animate is included in the Google One AI Premium subscription ($20/month, €21.99 in Spain). In comparison, OpenAI's Sora is offered as part of the ChatGPT Plus ($20/month) and Pro ($200/month) subscriptions.
  • Free/Experimental: Platforms like Google AI Studio and VideoFX (with a waiting list) provide free access, but with significant limitations in terms of quotas and available features.

The following table summarizes the access routes to Veo 2:

Table 1: Summary of Access to Google Veo 2

Platform

Access Method

Typical User

Key Specifications (Current Access)

Cost Model

Availability Status

Gemini API/Vertex AI

API key, Allowlist (some functions)

Developer, Company

Potential 4K/minutes, API: 720p/8s

Per Second ($0.35-$0.50)

GA, Preview (Edit)

Google AI Studio

Login

Developer

720p/8s

Free (Low Fees)

Experimental

VideoFX (Labs)

Login + Waiting List

Creator

720p/8s

Free (Low Fees)

Waiting List (Reg.)

Gemini Advanced

Google One AI Premium subscription.

Consumer

720p/8s (16:9)

Subscription ($20/month)

GA (Global)

Whisk Animate (Labs)

Google One AI Premium subscription.

Consumer, Creator

Image to Video (8s)

Subscription ($20/month)

GA (60+ countries)

YouTube Shorts

Integrated into app

Content Creator

Backgrounds / Clips (8s?)

Free (Integrated)

Deployment (Reg.)

 

This diversity of access points and pricing models reveals a tiered access strategy by Google. Higher capabilities (potentially 4K, longer videos, advanced controls) and higher prices are reserved for enterprise users and developers via the API, where perceived value and willingness to pay are greater. At the same time, more limited (720p, 8 seconds) but more affordable versions are offered to consumers and creators through subscriptions or free previews. This segmented approach allows Google to manage deployment complexity, the high processing costs associated with video generation, and maximize potential revenue by adapting to the needs of different market segments.

However, this pricing strategy puts Veo 2 in an interesting position relative to the competition. The high cost per second of the API ($0.35-$0.50) contrasts sharply with Sora's inclusion in ChatGPT's relatively affordable subscriptions ($20-$200 per month). While Sora doesn't yet have a widely available, publicly priced API, this fundamental difference in access models could put competitive pressure on Google's pricing. If OpenAI or other competitors offer APIs with lower unit costs, or if high-quality models become accessible through cheaper subscriptions, professional users who need to generate large volumes of video might find more attractive alternatives to Veo 2's API, potentially forcing Google to reconsider its pricing structure to remain competitive in this key segment.

Technical Capabilities of Veo 2: A Leap into Generative Video

Veo 2 operates primarily through two modes: Text-to-Video (t2v) generation, where a text description is transformed into a video scene, and Image-to-Video (i2v) generation, which animates a static image, optionally based on an additional text prompt to define style and movement. This model is the result of years of Google research into video generation, leveraging architectures and lessons learned from previous projects such as GQN, DVD-GAN, Image-to-Video, Phenaki, WALT, VideoPoet, and Lumiere, as well as the Transformer architecture and Gemini models.

Regarding the technical specifications at launch, Veo 2 represents a significant advance, although with important nuances between its potential and current accessibility:

  • Resolution: The base model is capable of generating video with a resolution of up to 4K.3 This is an improvement over Veo 1, which reached 1080p. However, many of the current publicly available implementations (API/Vertex AI, AI Studio, Gemini Advanced, VideoFX) are limited to 720p or 1080p in some contexts.
  • Video Length: Veo 2 has the ability to generate clips that exceed one minute or reach up to two minutes of continuous duration, and potentially even longer. This improves upon the capabilities of Veo 1 (>60s). However, current access via API, AI Studio, and Gemini Advanced is often restricted to 8-second clips.
  • Frame Rate: The API and Vertex AI documentation specifies a frame rate of 24 frames per second (FPS). Some comparisons mention 30-60 FPS.
  • Aspect Ratio: Through the API/Vertex AI, 16:9 (landscape) and 9:16 (portrait) formats are supported. The output in Gemini Advanced is 16:9.
  • Output Format: The MP4 format will be used for outputs generated through Gemini Advanced.

Beyond the basic specifications, Veo 2 introduces key qualitative improvements:

Video of a tomato being cut, generated by Veo 2

 

  • Enhanced Understanding and Realism: The model demonstrates advanced understanding of natural language and visual semantics, accurately interpreting the tone, nuances, and details of long prompts. It utilizes Transformer architectures (possibly UL2 encoders) for text processing. Crucially, Google highlights the simulation of real-world physics as a key improvement. Examples such as the physics of water, burning paper, or the precise slicing of a tomato without affecting the fingers illustrate this capability, positioning it as a key differentiator against competitors like Sora. This physics understanding translates into highly accurate motion rendering, with fluid movements of realistic characters and objects. The result is more realistic and faithful videos with fine details and a significant reduction in visual artifacts (such as extra fingers or unexpected objects) compared to previous models, employing techniques such as neural scene rendering and adaptive GANs. Furthermore, temporal consistency has been improved, maintaining the stability of characters and objects across frames through latent diffusion models. However, as can be seen in the video, it continues to generate impossible images, such as that wonderful cut of a piece of tomato that transforms into half a tomato after being cut.
  • Cinematic Control and Styles: Veo 2 interprets the "unique language of cinematography." It understands terms like "timelapse," "aerial shot," "drone shot," "tracking shot," "dolly shot," "close-up," "low angle shot," "pan right," and even allows you to specify the desired genre. It offers extensive camera controls over shot styles, angles, and movements—a key advantage. It can simulate specific lens effects (e.g., "18mm lens" for wide-angle shots) and effects like "shallow depth of field," including lens flare. It supports a wide range of visual and cinematic styles.
  • Editing Capabilities (Preview/Allowlist): Veo 2 introduces more sophisticated editing features, although these currently require access via an allowlist in Vertex AI. These include masking or inpainting, for removing unwanted elements (logos, distractions) from defined areas of the video, and outpainting, for extending the video frame by filling the new areas generatively, useful for changing aspect ratios. Interpolation for creating smooth transitions between still images and general editing capabilities for refining or reviewing content without starting from scratch are also mentioned.

Google's strong emphasis on understanding physics and motion in Veo 2 is no accident. It appears to be a central architectural focus, aimed at addressing a significant weakness observed in previous models and competitors like Sora (evidenced by the tomato slicing example). By positioning realism as its core value proposition, Google is directly targeting professional use cases (film previsualization, advertising, training) where unnatural motion breaks immersion and credibility. This focus strategically differentiates Veo 2 in the market, appealing to users who prioritize fidelity over, perhaps, pure speed or more abstract creative freedom.

However, a significant gap exists between the advertised potential and the reality accessible to many users. The difference between the touted ability to generate multi-minute 4K videos and the actual experience of obtaining 8-second, 720p clips creates a marketing challenge and can lead to disappointment. It suggests that while the core model is powerful, scaling and optimizing it for broad and affordable access remains a considerable technical hurdle, likely due to high computational costs, inference times, or potential consistency and security issues at longer durations. This discrepancy affects user perception: they see impressive demonstrations but interact with a less capable tool, which could damage the product's reputation despite its underlying potential.

Finally, the emphasis on specific cinematic controls (lenses, shot types, depth of field) is clearly geared toward professional filmmakers and creators. This approach aligns with the API's higher pricing model and enterprise collaborations, suggesting an initial goal of disrupting professional workflows. Google appears to have identified a primary market in professional content creation (advertising, film previsualization, marketing) where these controls offer significant value that justifies the cost, beyond mere consumer entertainment.

From I Spy 1 to I Spy 2

To fully understand the advancements of Veo 2, it's helpful to first establish the baseline of its predecessor. Veo 1 already offered remarkable capabilities: video generation up to 1080p, durations exceeding 60 seconds, understanding of cinematic terms, picture-to-video generation, application of editing commands, improved consistency through latent diffusion, and the implementation of SynthID watermarks and security filters.

Veo 2 represents a significant evolution on this basis, with key improvements in several areas:

  • Resolution: The most obvious leap is Veo 2's resolution target, which reaches up to 4K, surpassing Veo 1's maximum of 1080p.
  • Realism and Fidelity: Veo 2 introduces significant improvements in detail, realism, and artifact reduction compared to previous models and competitors. It produces fewer visual hallucinations, although, as you can see in the video accompanying this news article, this isn't always the case.
  • Motion and Physics: It features "advanced motion capabilities" and improved simulation of real-world physics, going beyond Veo 1's focus on consistency.
  • Camera Control: Offers "greater" and more precise camera control options, expanding the understanding of cinematic terms already possessed by Veo 1.
  • Video Length: The potential duration is extended, exceeding the one minute offered by Veo 1.
  • Editing: Introduces more sophisticated editing capabilities such as inpainting and outpainting (in preview), which go beyond the editing commands described for Veo 1.

The following table directly compares the key capabilities of Veo 1 and Veo 2:

Table 2: Comparison of Features Veo 1 vs. Veo 2 

Feature

I See Capacity 1

I See Capacity 2

Maximum Resolution

1080p

Up to 4K (potential)

Maximum Duration (Potential)

> 60 seconds

Up to 2 minutes or more

Physics / Movement

Focus on consistency

Advanced physics simulation, realistic movement

Realism / Fidelity

High quality

Significant improvements, fewer artifacts

Cinematographic Control

Understanding terms

Greater precision and options (lenses, etc.)

Editing Functions

Basic editing commands

Inpainting, Outpainting (Preview)

 

This progression from Veo 1 to Veo 2 illustrates an iterative improvement strategy by Google. The advancements in resolution, realism, physics, and control are not random; they focus on fundamental aspects of video quality and control that are crucial for professional adoption. This pattern suggests a structured development process, demonstrating a long-term commitment to refining the underlying technology.

Limitations and Challenges of Veo 2

Despite its impressive capabilities, Veo 2 is not without limitations and challenges, both inherent in current AI video generation technology and specific to its implementation and deployment.

  • Prompt Complexity and Adherence: Although natural language understanding has improved significantly, Veo 2 still struggles with extremely complex or detailed prompts, occasionally failing to follow all instructions accurately. Prompt engineering remains crucial for good performance. While benchmarks indicate high prompt adherence scores, there are instances where the model falls short of expectations.
  • Artifacts and Consistency: The generation of visual artifacts, although reduced, has not been completely eliminated. Occasional subject deformities, illegible text, or "hallucinations" such as extra fingers or unexpected objects may appear. Temporal consistency may fail in very complex scenes or those with fast movements, and the physics simulation may break down in particularly complex scenarios. Some user-generated examples have been described as "unnatural" or "disturbing."
  • Generation Speed: The time required to generate a video can be considerable. Some comparisons cite around 10 minutes per clip, which contrasts with the approximately 5 minutes attributed to Sora. However, some integrations, such as YouTube Shorts, appear to operate much faster. The API's latency is officially described as "typically a few minutes, but can take longer."
  • Editing Tools: The lack of built-in editing tools in some of the access interfaces (APIs, possibly the initial version of Gemini Advanced) forces users to resort to external software to make modifications. More advanced editing features in Vertex AI require access via an authorized user list. Sora, on the other hand, includes built-in editing tools.
  • Available Controls: Some early Veo users noticed that the version of Veo 2 they tested lacked controls for video resolution or duration compared to Sora. However, the API/Vertex AI does offer parameters to control duration, aspect ratio, negative prompts, and the generation seed.
  • Access and Cost: As we've detailed, fragmented access, waiting lists, geographical restrictions, and high API costs represent significant barriers to adoption. Currently, fees for the free tiers are extremely low, but given its recent launch, we'll have to wait a while to fully evaluate its impact.
  • Content Restrictions and Security Filters: Google's security filters are strict and can unexpectedly block content generation, even for seemingly harmless prompts. There are specific restrictions on generating images of people, especially minors (controlled by parameters like allow_adult or disallow in the API). Users have reported problems generating videos even from images containing people, or from scenes without them. This excessive censorship can render the tool unusable for certain use cases.
  • Capacity Limitations: Currently available versions lack sound generation. The difficulty in generating realistic hands remains a common problem across all AI models.

These limitations highlight an inherent trade-off between capability and usability. While Veo 2 boasts high-end capabilities (4K potential, realistic physics), restrictions on speed, accessible controls (in some versions), the lack of integrated editing, and strict content filters significantly impact practical usability. Compared to competitors that might be faster, more integrated, or less restrictive (such as Sora or Runway), Veo 2 users could potentially gain superior quality at the cost of a more cumbersome or limited user experience. This may affect adoption, especially for iterative or time-sensitive workflows.

Furthermore, reports of overly aggressive content filters blocking harmless prompts suggest a possible overreaction in Google's prioritization of security and brand risk mitigation. This caution could stem from past controversies with other AI models (such as Gemini's image generation). While security is paramount, overly stringent filters can render the tool unusable for many common applications (e.g., animating family photos), creating a significant risk-aversion-driven limitation.

Finally, the combination of capacity gaps (720p/8s vs. 4K/minutes), usability issues (speed, variable controls), and access barriers amplifies the "demo vs. reality" problem. The average user's experience may differ significantly from the polished demos presented by Google, which could damage credibility if expectations are not carefully managed. This significant gap between the promise and the reality experienced by the user can lead to disappointment and a negative perception, despite the technological achievement that Veo 2 represents.

I see 2 vs Sora and Others

Veo 2's position in the market is largely defined by its comparison with its main rival, OpenAI's Sora, as well as Runway.

Direct Comparisons (Veo 2 vs. Sora):

  • Quality/Realism: Numerous sources and early adopters cite Veo 2 as superior in terms of realism, physics simulation, and visual detail. Sora, on the other hand, sometimes struggles with fine details (such as hands) and physics. Some analyses suggest that Sora might be more "artistic" or creatively flexible.
  • Resolution: Veo 2 has the potential for up to 4K, while Sora is limited to 1080p.
  • Duration: The potential of Veo 2 (more than 1-2 minutes) exceeds the stated duration for Sora (20 or 60 seconds). However, the actual access to Veo 2 is usually shorter (8 seconds).
  • Speed: Veo 2 (approx. 10 min) is generally slower than Sora (approx. 5 min). It's important to note the existence of "Sora Turbo," a possibly faster and cheaper version, but potentially of lower quality than the original Sora demos.
  • Controls: Veo 2 is praised for its cinematic controls, while Sora is noted for its flexibility and features like storyboarding. However, MKBHD found that its trial version of Veo 2 had fewer controls than Sora.
  • Editing: Veo 2 lacks built-in editing (except in Vertex AI with allowlist); Sora offers built-in tools (Remix, Loop, Blend).
  • Access/Price: Access to Veo 2 is fragmented and the API cost is high; Sora is accessible through more affordable subscriptions. Currently, Sora is more accessible to the general public.

Benchmarking and Other Competitors:

The results of the MovieGenBench benchmark, where human evaluators rated videos generated from over 1,000 prompts, showed that Veo 2 outperformed Sora Turbo, Kling, and MovieGen in both overall preference and prompt adherence (evaluated at 720p with varying durations). However, it is crucial to recognize the limitations of these benchmarks, which may use cherry-picked results or be based on specific datasets.

The competitive landscape also includes Runway (with Gen-3 Alpha/Gen-4), Kling, AWS Nova Reel, Hailuo, Minimax, and potentially Meta MovieGen. Some users even express a preference for Runway or Hailuo over the current version of Sora they have access to.

The following table provides a snapshot comparison of Veo 2 against its main competitors:

Table 3: Comparative Snapshot of AI Video Generators

Feature

Google Veo 2

OpenAI Sora

Runway (Gen-3/4)

Main Fortress

Realism, Physics, Cinematic Control [Multiple]

Speed, Creative Flexibility, Editing

Fine Control, Specific Modes (Implicit)

Max Resolution.

4K (Potential)

1080p

Variable (720p-1080p+ depending on plan/version)

Maximum duration.

2 min+ (Potential)

20s / 60s

~15s (Gen-2), longer in Gen-3/4 (variable)

Speed

Slower (~10 min)

Faster (~5 min)

Fast (Gen-4 real-time?)

Editing Tools

Limited / External (API)

Integrated (Remix, Loop, etc.)

Integrated (Implicit)

Access Model

Fragmented (API, Subs, Labs) [Multiple]

ChatGPT Subscription

Subscription / Credits

Model Price

API: $/sec; Subs: $20/month

Subs: $20/$200 per month

Annual Plans ($144-$1500)

 

This comparison suggests a possible market segmentation based on each tool's strengths. Veo 2 appears to target high-fidelity professional use that values ​​cinematic quality and physical accuracy [Many snippets]. Sora could appeal to a broader audience of content creators for social media and creative experimentation, thanks to its speed, flexibility, and integrated editing capabilities. Runway, with its iterative approach and potentially specific features, could find its niche among visual artists and VFX professionals. The market doesn't appear monolithic; different tools are likely to coexist, serving distinct segments based on their core capabilities.

It is crucial to apply the "released version" caveat when evaluating these comparisons. Often, the public version of one model (such as "Sora Turbo," which some users consider inferior to the initial demos) is compared with carefully selected demos or limited-access versions of another (Veo 2). This makes it difficult to draw definitive judgments. The "best" model may depend heavily on which specific version is being evaluated and under what conditions, making superiority a shifting goal.

Finally, there is a recurring hypothesis regarding Google's data advantage. Several sources speculate that Google's direct and massive access to YouTube data gives it a significant advantage in training Veo 2 to achieve realistic movements and understand diverse scenarios, compared to competitors who might need to resort to data scraping. While not officially confirmed, this access to such a vast and potentially tagged video dataset could be a crucial long-term competitive advantage, potentially explaining Veo 2's perceived edge in realism and making it difficult for others to legally and effectively replicate.

Safety and Ethics in Veo 2

Google has emphasized its commitment to responsible AI principles in the development and deployment of Veo 2. The company states that it has conducted extensive red teaming testing and assessments to prevent the generation of content that violates its policies. Two main technical mechanisms underpin this approach:

  • SynthID Watermark: This technology is a key security feature implemented in Veo 2 and other Google generative models. It is an invisible digital watermark embedded directly into the pixels of video frames during generation. It is designed to be persistent even if the video is edited (cropped, filtered, compressed) and does not affect the perceptible visual quality. Its purpose is to allow the identification of content as AI-generated using specialized detection tools, thus helping to combat misinformation and misattribution.
  • Security Filters: Veo 2 incorporates filters designed to prevent the creation of harmful content. The API includes specific parameters to control the generation of people, such as allow_adult (allow only adults, default value) or disallow (do not allow people). However, as mentioned earlier, there are user reports indicating that these filters can be overly restrictive.

Beyond these technical measures, the deployment of Veo 2 is part of a broader ethical landscape with several key concerns:

  • Deepfakes and Disinformation: The ability to generate realistic videos carries the inherent risk of creating convincing deepfakes to spread false information or carry out malicious impersonations. SynthID is Google's primary technical defense against this risk.
  • Intellectual Property and Copyright: Ownership of AI-generated content remains a legal gray area. Furthermore, concerns arise regarding the data used to train these models, such as the potential use of YouTube videos for this purpose without explicit consent.
  • Biases: As with any AI model trained on large datasets, there is a risk that Veo 2 will perpetuate or amplify existing social biases in its results, although Google claims to take steps to mitigate this.
  • Job Displacement: The increasing capabilities of these tools are raising concerns about their impact on creative industries, with the potential displacement of roles in film, animation, marketing, and design. One cited study estimates a significant impact on jobs in the US by 2026.

Google's prominent deployment of SynthID in its generative models represents a proactive technical approach to addressing the risks of misinformation. Embedding the watermark during generation is a built-in preventative measure, unlike post-hoc detection. This suggests that Google considers watermarking essential for responsible deployment. However, the success of this strategy depends on the actual robustness of the watermarks and the widespread adoption of reliable detection tools. It is a technical solution to a complex socio-technical problem.

The tension between implementing robust security filters and maintaining user usability, as evidenced by complaints, underscores a fundamental dilemma for AI developers: security versus usability. Overly stringent filters can render a tool useless, while lax filters increase risks. Finding the right balance is an ongoing challenge, with significant implications for user adoption and societal impact. Google's current calibration appears to lean toward caution, which could affect its competitiveness if users find the tool too restrictive for their needs.

Finally, features like SynthID and configurable (albeit imperfect) security parameters represent Google's attempt to embed ethical considerations into the product's design itself. This goes beyond policy statements to reach the technical implementation. While execution may have flaws (overly strict filters), the approach of integrating security into the tool's architecture reflects a specific stance on the responsible development of AI, seeking to enforce ethical use through the technology itself.

Impact and Future Trajectory of Veo 2

The launch and evolution of Veo 2 have significant implications that extend beyond its technical specifications, potentially affecting multiple industries and redefining creative processes.

Impact on Creative Industries:

Veo 2 has the potential to revolutionize workflows in several sectors:

  • Film: It can streamline previsualization and concept testing, generate background assets, and even produce complete short films. Collaboration with filmmakers like Donald Glover and his studio Gilga underscores this approach.
  • Marketing and Advertising: It enables rapid ad prototyping, the generation of customized advertising content at scale, and the creation of product demonstrations. Companies like Mondelez, WPP, Agoda, AlphaWave, and Trakto are already exploring it. Key features include a drastic reduction in production time (from weeks to hours, according to Kraft Heinz) and less reliance on stock footage.
  • Video games: Can be used to generate realistic cinematics or promotional material.
  • Education and Training: Facilitates the creation of illustrative videos to explain complex concepts or simulate procedures (e.g., medical training).
  • Social Media: Integration with YouTube Shorts and the ability to generate short, engaging clips make it a powerful tool for content creators on platforms like TikTok.

Democratization vs. Disruption:

Veo 2 embodies a duality: on the one hand, it democratizes high-quality video production, making it accessible to small businesses and individual creators who previously lacked the necessary resources or technical skills. On the other hand, it threatens to disrupt traditional roles in the creative industries and fuels concerns about the proliferation of low-quality or automatically generated "AI slop" content.

Future Development:

Users expect Veo 2 to eventually include many improvements in later versions, such as:

  • Capabilities Expansion: Continuous quality improvement, wider deployment of 4K and longer-lasting capabilities, and possibly the addition of sound generation.
  • Ecosystem Integration: Greater integration with other Google products such as Vertex AI, YouTube, and potentially Search and the Gemini ecosystem. A combination with Gemini is envisioned to enhance our understanding of the physical world.
  • Rapid Evolution: The pace of development will continue to be accelerated, driven by intense competition in the field, with developments expected in the coming years.

The analysis suggests that tools like Veo 2 don't eliminate creative work, but rather shift the bottleneck. The main difficulty no longer lies so much in the technical execution (filming, editing, visual effects), but in ideation, prompt engineering, and editing the generated content. Success will increasingly depend on creative vision and the ability to communicate effectively with AI. Creative direction and the ability to formulate precise and evocative prompts become critical skills.

Rather than a complete replacement, the most likely short-term impact is the emergence of "AI-augmented" professional roles. Professionals in film, marketing, design, and other fields will use tools like Veo 2 to improve their productivity, accelerate iteration, and explore new creative possibilities. This will require adaptation and the development of new skills focused on the effective use of these tools, transforming existing roles rather than eliminating them entirely in many cases.

Finally, the integration of Veo 2 into the Google ecosystem (Gemini, Vertex AI, YouTube, Labs) is a clear strategic move. It aims to create synergies (using Gemini to generate prompts, images for I2V inputs, and YouTube data for training) and encourage user retention within its platforms. This holistic approach could provide a competitive advantage over standalone tools, making Google's AI offering more attractive than the simple sum of its parts for users already familiar with its ecosystem.

Videos generated by Veo 2

Here are several videos generated by Veo 2. As you will see, Veo 2 tends to generate impossible elements; at the bottom we indicate the prompt used.

Video of a parakeet hitting a windowpane with its beak, generated by Veo 2

 

Video of a passenger plane flying among clouds with a person on top of the fuselage, generated by Veo 2

 

Disney-style video of a rabbit reading a book, generated by Veo 2

 


Cosmos

Computing

Economy

Cryptocurrencies

General

Nature