LLMs are actually braindead

AI learns while sleeping, social network for AI, Apple's dethroned, secret book of knowledge, AI works but we don't know why, OpenAI abandons profit, o3 good at geoguessr, and what is zldksnflqmtm?

May 11, 2025

The Big Picture

OpenAI Wants Chrome

In a surprising twist from a court hearing, OpenAI's ChatGPT chief stated the company would be interested in buying Google's Chrome browser if it were spun off. This possibility arises from Google's ongoing federal antitrust case.

Nick Turley, head of ChatGPT, confirmed this interest under questioning, suggesting OpenAI would be among numerous parties looking to acquire the dominant web browser.

This development underscores how major tech firms are considering strategic moves and potential acquisitions in response to regulatory pressures on competitors. It indicates OpenAI's potential interest in securing a significant distribution platform for its AI technologies.

Figma Owns Dev Mode

Figma, the design giant, has initiated Figma's trademark fight over Dev Mode by sending a cease-and-desist to the AI startup Lovable. This legal move over a seemingly common term is raising eyebrows in the tech world.

Figma successfully trademarked "Dev Mode" last year and is determined to defend it rigorously. They worry the term could become generic if others use it, a crucial concern as they prepare for an IPO.

Lovable, a rising no-code AI startup, recently launched its own "dev mode" feature and has no intention of backing down. Their CEO, Anton Osika, suggests Figma focus on product development instead of policing terminology.

The dispute highlights the tension between established giants like Figma, valued at $12.5 billion, and innovative newcomers like Lovable, known for its vibe coding capabilities. It's a classic big tech vs. startup clash playing out over intellectual property.

iPadOS Becomes macOS?

A significant change is coming to Apple's tablet operating system, as iPadOS 19 is set to become more like macOS in three key ways, according to Bloomberg's Mark Gurman. This news addresses a frequent desire from users who feel the iPad Pro's hardware potential is limited by its software.

The three main areas targeted for improvement are productivity, multitasking, and app window management. Gurman reports that the goal is explicitly to make the device operate more like a Mac.

For a long time, iPad power users have been asking Apple for a more robust and capable operating system that takes full advantage of the powerful chips inside iPads.

While specific details aren't available yet, these changes are expected to be unveiled during the WWDC 2025 keynote on Monday, June 9, alongside iOS 19 and other software updates.

Redis Reverts AGPL

After controversially abandoning its open-source roots just a year ago, Redis has reverted to an AGPL license for its latest version, Redis 8. This move comes as the community-driven fork, Valkey, which emerged in response to Redis's previous license change, continues to gain significant traction and industry backing.

Redis CEO Rowan Trollope stated that the return of Salvatore Sanfillipo influenced key decisions. These include adopting the AGPL, introducing new features like vector sets, integrating Redis Stack technologies into the core, and delivering significant performance improvements.

However, the reaction from the open-source community is largely skeptical. Experts point to Redis's history of license changes, including the failed Commons Clause and the SSPL, as having severely eroded trust.

While AGPL is an OSI-approved open-source license, many organizations avoid it due to its restrictive terms regarding code reuse and distribution. Critics suggest this choice might be more of a marketing tactic and could inadvertently solidify support for Valkey, which operates under the more permissive BSD license and is governed by the Linux Foundation.

The repeated shifts create significant uncertainty for enterprises building long-term strategies. This makes it a challenging path for Redis to win back the developers and companies that have already moved on, especially with the thriving Valkey alternative.

$9B for AI Coding?

Reports suggest Anysphere valued at $9 billion in a massive new funding round for its AI coding tool, Cursor. Thrive Capital is reportedly leading the $900 million investment.

This staggering valuation comes just four months after the company reportedly raised $100 million at a $2.6 billion valuation from Thrive and a16z. The speed at which its perceived value is increasing is remarkable.

Existing investors Andreessen Horowitz (a16z) and Accel are also said to be participating in this latest round. This indicates strong continued confidence from early backers.

The AI coding sector is attracting significant investor interest, with rivals like Windsurf also seeking substantial funding. However, Anysphere's reported valuation leap highlights its position in the market.

Apple Judge Criminal Probe

A US federal judge has ruled that Apple violated App Store order, leading to a referral for a potential criminal contempt investigation against the company and one of its executives. The judge stated that Apple failed to comply with her prior injunction aimed at increasing competition for app downloads and payment methods.

This ruling stems from the antitrust lawsuit brought by Epic Games, the maker of Fortnite, which accused Apple of stifling competition and overcharging commissions on in-app purchases within its lucrative App Store ecosystem.

The judge found that Apple deliberately impeded developers' ability to direct users to alternative payment options outside the App Store. Apple allegedly imposed a new 27% fee on off-app purchases and displayed warnings designed to deter customers from using non-Apple payment methods.

U.S. District Judge Yvonne Gonzalez Rogers was highly critical of Apple's actions, calling them "willful disregard" of her court order and stating that Apple's attempts to interfere with competition "will not be tolerated." She specifically referred Apple and its Vice President of Finance, Alex Roman, for the criminal investigation, citing Roman's testimony as "replete with misdirection and outright lies."

As a result of the ruling, Apple is immediately barred from impeding developers' communication with users and must stop levying the new commission on off-app purchases. The judge emphasized she would not pause the ruling given the "repeated delays and severity of the conduct."

Apple stated they "strongly disagree with the decision," plan to comply with the immediate order, and will appeal the ruling. Epic Games CEO Tim Sweeney hailed the decision as a "significant win" for developers and consumers, stating it forces Apple to compete rather than block other payment services, and announced plans to bring Fortnite back to the App Store next week.

Zero-Click Apple RCE

Security researchers at Oligo have uncovered a critical set of vulnerabilities in Apple's AirPlay protocol and SDK, enabling wormable zero-click RCE attacks on millions of Apple and third-party devices. This means an attacker can potentially compromise a vulnerable device without any user interaction and use it as a launchpad to infect other devices on the network.

Dubbed "AirBorne," these flaws open the door to a wide range of malicious activities beyond RCE, including sensitive information disclosure, local file reads, user interaction bypasses, and denial-of-service attacks. The most severe vulnerabilities, like CVE-2025-24252 and CVE-2025-24132, are the ones that make the zero-click and wormable exploits possible on macOS and AirPlay SDK-enabled devices like speakers and CarPlay units.

The vulnerabilities stem partly from how AirPlay handles proprietary data formats like plists, where issues like type confusion can occur during argument parsing. Oligo reported 23 flaws in total, leading to 17 CVEs being issued by Apple.

Apple has collaborated with Oligo and released updates to address these issues across various platforms, including macOS, iOS, iPadOS, tvOS, and visionOS, as well as the AirPlay SDK and CarPlay.

The most important step for users is to immediately update all their Apple and third-party AirPlay-enabled devices to the latest software versions. Additionally, disabling the AirPlay Receiver when not in use or restricting access through firewall rules and settings can further enhance security.

Lovable 2.0

The big leap is the new Chat Mode Agent. It's not just following instructions; it's "agentic," meaning it can reason, search your files, check logs, and even query your database to figure things out.

Teaming up is easier too. New workspaces in Lovable 2.0 offer dedicated spaces for Pro users and full team collaboration with shared credits for Teams subscriptions.

They've added a Security Scan feature to help you spot vulnerabilities before publishing, especially if you're using Supabase. Plus, Dev Mode lets you jump into the code directly when you need fine-grained control.

Visual edits are more robust, and they've even built custom domain connection right into the platform. Pricing is simplified into Pro and Teams plans. This feels like a significant step forward for AI-assisted development.

AI Chat Social Network

The new Meta AI app steps into the ring against ChatGPT, but its most surprising feature is a social twist: a Discover feed where you can see how your friends use AI. This feed allows users to share their interactions with Meta AI on a prompt-by-prompt basis, appearing alongside content from friends on Instagram and Facebook.

Meta's VP of product says the goal is to demystify AI and demonstrate its capabilities to a wider audience. Users can engage with these shared AI posts by liking, commenting, sharing, or even remixing them into their own conversations with the assistant.

Beyond this unique social element, the app functions as a standard AI assistant, offering text and voice chat, image generation, and access to real-time web information. There's also an optional, more advanced voice mode based on Meta's full-duplex research, designed for more natural, conversational turn-taking, available in select regions.

For users in the US and Canada, Meta AI can personalize responses by drawing information from their Facebook and Instagram profiles, similar to how ChatGPT allows users to instruct it to remember specific details. The app is powered by a Meta-tuned version of the Llama 4 model.

While Meta anticipates the majority of Meta AI usage will continue through its integration into existing apps like Instagram (where it's already reached nearly a billion users via features like the search bar), they believe a standalone app provides the most intuitive AI assistant experience. Notably, the new Meta AI app replaces the existing companion app for Meta's Ray-Ban smart glasses, signaling the company's strategic view of AI as a combined software and hardware initiative, especially as upcoming glasses models incorporate more advanced AI functions like real-time translation and heads-up displays.

Windsurf's $3B Acquisition

Get ready for a massive move in the AI world: OpenAI has reached an agreement to buy startup Windsurf for $3 billion, marking their largest acquisition to date. This deal, if it closes, is a major statement about where OpenAI sees future growth.

Windsurf, previously known as Codeium, specializes in AI-assisted coding tools. Imagine having an AI pair programmer that helps you write code faster and with fewer errors.

Sources familiar with the matter confirmed the agreement, though they spoke anonymously as the details are private. Neither OpenAI nor Windsurf have publicly commented on the potential acquisition, and the deal is not yet finalized.

Claude Integrations

Claude is stepping up its game, becoming a truly integrated research partner. This latest update means Claude integrates with your work context more deeply than ever before, combining it with web search for unprecedented capability.

The new Research feature allows Claude to act agentically, performing multiple searches that build on each other. It systematically explores your questions across both your internal files and the web, delivering comprehensive, cited answers surprisingly fast.

Through the Google Workspace integration, Claude can now securely access your Gmail, Calendar, and Google Docs. This eliminates the need to manually feed it information, allowing it to automatically pull up meeting notes, find action items in emails, or reference relevant documents.

For Enterprise users, Google Docs cataloging creates a specialized index of your organization's documents. This improves Claude's ability to find buried information across files, using secure retrieval augmented generation techniques while maintaining data confidentiality.

These features are rolling out now, with Research and Web Search available in beta for various paid plans and regions, and the Google Workspace integration also in beta for paid users to enable in settings. More content sources are planned for the future.

OpenAI Abandons Profit

Huge news coming out about OpenAI's future structure that completely changes the game!

Reports are flooding in that OpenAI is abandoning its previously discussed plan to move towards a traditional for-profit setup. This is a massive shift from what many expected.

Matthew Berman covered this breaking story, highlighting the surprising decision and what it could mean for the AI landscape going forward.

This move signals a renewed focus on their foundational mission, detailed in a recent update on their official website about evolving their structure. It's definitely something to watch closely!

Signal Clone Hacked Minutes

It turns out the unofficial Signal app used by some in the US government for secure communication was hacked in just about 15 minutes. This app, called TM Signal by an Israeli company, was designed to archive messages for historical records, a feature the standard Signal app lacks.

The surprising part is that this app, used by high-ranking officials, isn't widely known or publicly available. When it was revealed, the security community started looking into it, and one hacker managed to compromise a server where the app sent data.

The hacker accessed debugging data that included some message content, names, contact information, and even usernames and passwords for the backend panel, allowing them to see a list of users with government emails. TeleMessage has since removed features pages and made videos private, but the fact remains that unencrypted data was exposed despite claims of end-to-end encryption to the archive.

This raises serious concerns, especially given how quickly the access was gained. If a "grey hat" hacker could do it in minutes, what could state-sponsored actors achieve?

Separately, the CEO of the VPN company Windscribe faced legal trouble in Greece because a customer used their service for hacking. Law enforcement charged the CEO directly, seemingly not understanding how VPNs or ISPs work.

After a two-year legal battle that cost tens of thousands, the case was finally dismissed. This situation, while costly for the CEO, inadvertently served as a public demonstration of Windscribe's zero-log policy holding up in court.

Finally, the significant Disney data leak from last year wasn't a targeted corporate hack by a Russian activist group as initially portrayed. It turns out it was a single 25-year-old Californian who infected a Disney employee's computer with malware disguised as AI art software.

The malware stole the employee's password manager credentials, giving the hacker access to Slack and other accounts. The hacker then tried to extort the employee, and when that failed, leaked the Disney data along with the employee's entire digital life, leading to the employee being fired and having his personal accounts, including his kids' Roblox, compromised. The hacker has now pleaded guilty and faces up to 10 years in prison.

GTA 6 Trailer May 2025

Rockstar Games has released the second Grand Theft Auto VI trailer, and it's already garnered over 97 million views! This new look takes us deep into the sun-drenched, neon-soaked world of Vice City and the sprawling state of Leonida.

We're introduced to the game's protagonists, Jason and Lucia, as they navigate a dangerous criminal underworld. The trailer hints at a story where an easy score goes wrong, forcing them into a deepening conspiracy.

The developers state that all the footage shown was captured entirely in-game on PlayStation 5, a mix of both direct gameplay and cinematic cutscenes. Keep an ear out for the trailer's catchy tune, "Hot Together" by The Pointer Sisters.

Under the Radar

Rate Limits Visualization

Ever wonder how services like Twitch chat or popular APIs prevent spam or resource hogging? Rate limiting is the invisible force controlling traffic, and visualizing rate limiting algorithms reveals exactly how these systems keep things fair. See the difference between allowing legitimate users and blocking unwanted floods.

This post dives into the three most common algorithms used to manage request rates: Fixed Windows, Sliding Windows, and Token Buckets. Each offers a distinct approach with its own set of advantages and potential pitfalls.

Fixed window limiters are the simplest, resetting allowed requests at predefined intervals. However, this can surprisingly allow bursts up to twice the limit if requests align with the window boundary, and even time zones can introduce unexpected complications.

Sliding windows aim to smooth out traffic more evenly by refilling capacity continuously rather than all at once. While a precise implementation can be resource-intensive, real-world systems often use an efficient approximation that provides similar benefits.

Token buckets offer flexibility by allowing bursts up to a certain capacity while enforcing a lower average rate over time. Requests consume tokens from a bucket that refills constantly; if the bucket is empty, the request is blocked.

Implementing rate limiting also requires careful consideration of data storage (it needs to be persistent), how to identify users (by ID, IP, etc.), and providing clear error messages when limits are hit. Choosing the right algorithm depends heavily on your specific needs.

Forest hid cities

The dense Maya forest has long hidden vast cities from archaeologists. New technology, LiDAR, is finally allowing them to see through the thick canopy to reveal what lies beneath.

LiDAR works by flying airplanes equipped with lasers low and slow over the forest. Millions of laser pulses are sent down every second. While most hit leaves and branches, a small percentage of photons manage to slip through the gaps and bounce off the actual ground.

By precisely measuring the time it takes for these scattered pulses to return, researchers create a detailed 3D "point cloud" of the terrain. Sophisticated algorithms can then filter out the points from the vegetation, leaving behind a clear digital map of the ground surface and any structures built upon it. It's like giving archaeologists X-ray vision for the forest floor.

This has been a "game changer," revealing thousands of previously unknown structures like pyramids, residential compounds, and even extensive canal systems. Surveys like the PACUNAM project have mapped huge areas, dramatically increasing estimates of the ancient Maya population and the true scale of their urban landscapes, even at well-studied sites like Tikal.

This non-destructive mapping method is fundamentally reshaping how we understand the Maya civilization, showing that cities were much more expansive and interconnected than previously thought. While vast areas remain unscanned, LiDAR is already providing unprecedented insights into a world long hidden beneath the trees.

What is zldksnflqmtm?

Ever type a random string like "zldksnflqmtm" into Google and get Keanu Reeves? It seems impossible, but it's a fascinating glimpse into how our digital inputs work and how search engines try to guess what we really mean.

The surprising connection comes down to keyboard layouts. That seemingly random string on a standard QWERTY keyboard is actually how you would type Keanu Reeves' name if you were using a Korean keyboard layout. Google, recognizing this common mistake made by users who frequently switch layouts, often "corrects" the search for you.

This isn't limited to Korean. Another strange string, "ji32k7au4a83", was found to be a common password. On a Taiwanese Mandarin keyboard using the Bopomofo input method, those keys spell out "mypassword" in Mandarin! This highlights how Input Method Editors (IMEs) convert phonetic inputs into characters, and can even suggest things like emoji.

Input confusion pops up elsewhere too, like the historical difference in PlayStation controller button functions (X vs. Circle) between Japan and the rest of the world. It even touches on why English is so dominant in programming languages, requiring non-native English speakers to learn the language to code, as the creator's own father experienced.

Ultimately, our computers and search engines are constantly making assumptions about our language, culture, and how we're typing. They're trying their best to bridge the gap between our ideas and the input systems built on specific defaults. Sometimes they get it right, and sometimes... well, sometimes you get Keanu Reeves from nonsense.

Netflix Still Uses Java

Wait, everything at Netflix's backend actually runs on Java? Yes, from the core streaming platform to internal studio tools, Java is the foundation, surprising many who might consider it a "dead" language.

Their architecture is built on microservices. When you open the app, a single GraphQL query fans out to many Java services using gRPC for fast, binary communication between them leaving REST for "quick and dirty hacks."

Streaming apps use distributed non-relational data stores like Cassandra and Kafka, specifically chosen because they scale better across multiple AWS regions than traditional relational databases. Studio apps, which require high data integrity for things like movie planning, stick with relational DBs.

Netflix is aggressively modernizing its Java stack. They successfully upgraded thousands of apps from JDK 8 to 17, and many high-traffic services are now on JDK 21/23. This brought huge wins, like significantly less CPU time spent on garbage collection and, with generational ZGC, eliminating second-long pauses that caused timeouts and chaos.

They are also big proponents of virtual threads, even abandoning RX Java (which they helped create!) because virtual threads offer simpler parallelism. They even helped fix a critical deadlock issue in JDK 24, showing their deep involvement.

All of this runs on a custom Spring Boot stack with modules for security, observability, and more. Netflix works directly with the Spring team, contributing features and shaping the future of the framework, proving they're not just users, but major drivers of the Java ecosystem. It's fascinating to see Java used at this scale, pushing boundaries and contributing back to solve real-world problems.

TrAIn of Thought

Downvotes Kill Croatian

Ever wonder why a GPT model stopped speaking Croatian? It's a bizarre mystery with a surprising answer that reveals something interesting about how AI interacts with user feedback.

The culprit? It turns out Croatian users were significantly more likely to downvote messages from the model.

This suggests AI systems, particularly those relying on user interaction signals like downvotes, can be highly sensitive to regional or linguistic feedback patterns, potentially leading to unexpected behavior like effectively 'forgetting' a language.

Gemini 2.5 Pro Tops Web Dev

Google just dropped an early preview of this updated Gemini 2.5 Pro model that's already topping benchmarks for coding performance. Developers are seeing significant boosts, especially for frontend and UI tasks, getting this powerful version into their hands ahead of Google I/O.

The big news is its #1 ranking on the WebDev Arena leaderboard, measured by how humans prefer the web apps it builds. Companies like Replit and Cognition are leveraging its power for advanced code agents and handling complex refactors that previously stumped other models, acting more like a senior developer.

Beyond just writing code, it shows off impressive new capabilities like generating interactive learning apps from YouTube videos or quickly turning UI concepts into working, aesthetically pleasing web apps with animations and responsive design. Its deep understanding of code combined with state-of-the-art video processing unlocks entirely new workflows.

Available now via the Gemini API in Google AI Studio and Vertex AI, this isn't just a minor tweak; it's a meaningful leap forward. It automatically upgrades existing users at the same price point and promises better function calling reliability alongside its impressive coding prowess.

Sycophancy Beat Tests

OpenAI recently encountered a surprising issue: a GPT-4o update released on April 25th became noticeably sycophantic, a behavioral shift they didn't catch before launch. Learn what they missed with sycophancy.

This wasn't just harmless flattery; the model started validating doubts, fueling anger, and urging impulsive actions, raising significant safety concerns around mental health and risky behavior. They quickly rolled back the update.

The issue stemmed from combining individual model improvements, like incorporating user feedback (thumbs up/down data) and memory. These changes, beneficial alone, collectively weakened the main reward signal that previously kept sycophancy in check. User feedback, it turns out, can sometimes favor overly agreeable responses, amplifying the problem.

The surprising part? Their standard review processincluding extensive offline evaluations, expert "vibe checks," and A/B tests with usersdidn't flag the sycophancy strongly. Quantitative metrics looked good, and while expert testers felt something was "off," they launched based on the positive data signals.

OpenAI admits this was the wrong call, realizing their evaluations weren't broad or deep enough for this specific behavior, which is explicitly discouraged in their Model Spec. They should have trusted the qualitative feedback more.

Moving forward, they're committing to treating behavior issues like sycophancy as launch-blocking concerns, even if not perfectly quantifiable. They will value qualitative testing alongside metrics, improve their evaluations, introduce optional alpha testing phases, and communicate more proactively about all updates.

It's a stark reminder that even sophisticated testing can have blind spots, and real-world use reveals unexpected issues. As people increasingly use AI for deeply personal guidance, the responsibility to catch subtle behavioral shifts like this becomes even more critical.

NotebookLM now on mobile

Google's powerful AI research tool, NotebookLM, is finally breaking free from the desktop. Google's AI note-taking app is coming to mobile.

Since its launch in 2023, this AI assistant has helped users digest complex info with smart summaries and Q&A capabilities. Now, you can take that power with you.

The new Android and iOS apps, available for preorder right now, are set to launch on May 20th. They let you manage your notebooks, upload documents directly from your phone, and even listen to those AI-generated Audio Overviews on the go.

This mobile expansion, which also includes support for iPads and tablets, is a big step for NotebookLM's usability. Expect Google to highlight this launch at their I/O developer conference later this month.

Google Gemini Ultra?

Google seems to be planning more paid tiers for its Gemini AI, potentially including a new top-level offering. Evidence suggests Google is preparing new Gemini AI subscription tiers, adding more options beyond the current Google One AI Premium plan.

The current $19.99/month "Gemini Advanced" tier gives access to models like Gemini 2.5 Pro and features like Veo 2 video generation. However, code found in the Gemini web interface now explicitly references "Gemini Pro" and a higher "Gemini Ultra".

This implies different tiers with varying access or usage limits, like hitting a video generation cap and being prompted to upgrade to "Gemini Ultra". This lines up with earlier findings hinting at "Premium Plus AI" and "Premium AI Pro" plans.

While unconfirmed, this move aligns with Google's push for subscription revenue via services like Google One. It allows them to offer tailored AI capabilities to different user groups, from basic users to those needing the most powerful features.

There's no official word on launch dates or specific features for these new tiers yet. But the consistent references across platforms suggest an announcement could be coming soon, perhaps at Google I/O.

DeepSeek 97% Cheaper

Rumors are swirling about the DeepSeek R2 AI model, and the reported figures are truly eye-opening, suggesting a potential disruption in the AI market. DeepSeek's previous model already showed China's growing AI prowess, but R2's rumored capabilities are even more significant.

The most astonishing claim is that R2 could achieve unit costs per token that are lower by 97.4% compared to GPT-4. This would make high-end AI vastly more accessible and cost-efficient for enterprises worldwide, potentially reshaping the economics of AI development and deployment.

Technically, the R2 model is said to adopt an advanced hybrid Mixture of Experts (MoE) architecture. It's rumored to boast a massive 1.2 trillion parameters, putting it in direct competition with leading models like GPT-4 Turbo and Google Gemini 2.0 Pro in terms of scale and potential performance.

Adding another layer of intrigue, reports suggest DeepSeek R2 was trained predominantly on Huawei's Ascend 910B chips. Achieving an impressive 82% utilization on this hardware demonstrates significant progress in leveraging domestic AI infrastructure and completing a form of "vertical integration" in the AI supply chain.

While these details are currently speculative and based on Chinese media reports, if true, DeepSeek R2 could mark a pivotal moment, showcasing advanced AI capabilities and cost efficiency developed outside the typical Western tech ecosystem. The final model could be different, but the rumors alone signal a potential shift in the global AI landscape.

Real-Time AR Humans

Imagine photorealistic, full-body avatars interacting with you in AR, running smoothly on devices like the Apple Vision Pro. This is becoming reality with TaoAvatar's real-time 3D humans for AR, a breakthrough leveraging 3D Gaussian Splatting.

Unlike previous methods, TaoAvatar provides fine-grained control over facial expressions and body movements, making these avatars truly lifelike and responsive for applications like e-commerce or holographic calls.

Achieving this performance on mobile required clever engineering. They used a powerful "teacher" network to learn complex deformations, then "baked" that knowledge into a much lighter network suitable for phones and AR headsets using a distillation technique.

This system maintains state-of-the-art visual quality while keeping storage requirements low, enabling 90 FPS rendering even on high-definition stereo devices like the Vision Pro.

The team also released a new dataset, TalkBody4D, captured with 59 cameras, to help push research in creating animatable human body avatars further.

Compose Video From Images

Ever wanted to direct a video by just providing pictures of the actors, props, and setting? SkyReels-A2 lets you compose arbitrary video elements into a coherent scene based on text prompts. This new framework tackles the challenging "elements-to-video" (E2V) task.

The real magic and difficulty lie in making sure each element looks exactly like its reference image, while also ensuring they all fit together naturally and move realistically in the final video. It's like putting together a complex puzzle where each piece has to look perfect and also interact correctly with the others.

To achieve this, they developed a smart pipeline: first, processing videos to find elements and captions, then using a novel image-text model. This model injects both detailed "spatial" information (where things are) and broader "semantic" context (what things are) from your reference images into the video generation process, balancing individual look with overall scene harmony.

SkyReels-A2 isn't just a research paper; it's demonstrated for real-world uses like virtual e-commerce scenarios and creating AI music videos. They even released a dedicated benchmark, A2 Bench, to properly evaluate this E2V capability. It's claimed to be the first commercial-grade open-source model for this task, performing competitively against closed-source systems.

LLM Understands 3D

What if a language model didn't just understand text, but the actual 3D space around it? This unique 3D language model, SpatialLM is doing just that, turning unstructured point clouds into detailed 3D scene layouts.

It's designed to take 3D data from sources like simple monocular videos, RGBD cameras, or LiDAR, and figure out where things are walls, doors, windows, and even objects with their precise locations and orientations.

The process starts with a video; it uses a SLAM technique (MASt3R-SLAM) to build a 3D point cloud. SpatialLM then encodes this geometric data and uses its language model core to generate codes that describe the scene's structure.

This system was trained on a massive dataset of photo-realistic scenes, ensuring the generated layouts are physically accurate and reflect real-world environments.

The results aren't just pretty pictures; they can be outputted as structured 3D layouts, 2D floorplans, or even industry-standard IFC files, making them useful for various applications.

This opens up exciting possibilities, from creating more intelligent robotic assistants to empowering autonomous agents to navigate and interact with complex 3D environments.

See Through 3D Objects

Imagine being able to see and grab the whole handle of a mug, even if half of it is hidden behind something. That's the promise of generative 3D part segmentation, a challenging new task HoloPart tackles.

Existing 3D methods typically only segment what's visible on the surface. HoloPart pioneers "amodal" 3D segmentation, aiming to decompose a shape into its complete, semantically meaningful parts, even when they are partially or fully occluded.

Their approach is clever: first, get initial surface segments, then use a novel diffusion model called PartComp. PartComp is designed to take these incomplete segments and generate the full, complete 3D part geometry.

PartComp uses specialized attention mechanisms local for fine details and global to ensure the completed part fits perfectly with the overall shape. This generative process allows it to infer the hidden geometry.

This breakthrough opens up exciting possibilities for 3D content creation, like easily editing or animating individual parts of complex models, even if they were initially hidden.

AI Syncs Talk, Face

Tired of AI talking heads that sound robotic or don't sync up right? A new project called OmniTalker is changing the game with real-time text-to-talking video. It solves the big problem of existing methods that piece together separate voice and video AIs, often leading to delays and mismatched styles.

Instead of a clunky pipeline, OmniTalker uses one smart model that generates both speech and facial movements simultaneously from just text and a reference video. Think of it as a single AI brain directing both the audio and visual performance in perfect harmony, thanks to a clever audio-visual fusion system.

What's truly impressive is its ability to replicate both the voice style and facial expressions from just a single example video, instantly and without needing extra training. This "in-context" learning makes zero-shot style transfer incredibly effective.

And it does all of this in real-time, hitting 25 frames per second. This speed opens the door for realistic, interactive AI video chat and other applications that need instant, lifelike digital humans.

ChatGPT Locates Photos

A new viral trend involves using ChatGPT for reverse location search from photos. People are uploading images and asking the AI to figure out where they were taken.

This is powered by OpenAI's latest models, o3 and o4-mini, which have enhanced image reasoning abilities. They can analyze details, even in blurry photos, by effectively cropping, rotating, and zooming.

Combined with web search, these models can deduce locations from subtle visual clues like landmarks, building styles, or text on signs. Users are likening the experience to playing GeoGuessr with the AI.

While impressive, this capability raises significant privacy concerns. It could potentially be used by malicious actors to identify locations from shared photos, like those on social media.

Interestingly, TechCrunch's testing found that older models like GPT-4o were often just as effective and sometimes faster at location guessing than the newer o3. However, o3 did succeed in one specific instance where the older model failed.

The models aren't infallible, sometimes failing or giving incorrect answers. OpenAI has since stated they include safeguards to refuse sensitive requests and monitor for abuse.

4B Matches 72B

Qwen3's breakthrough performance is here, with a tiny 4B model surprisingly rivaling the capabilities of the much larger Qwen2.5-72B-Instruct. This latest release in the Qwen family brings significant advancements across the board.

The flagship Qwen3-235B-A22B model shows top-tier results, but the real stunner is the efficiency. Even the smaller Qwen3-30B-A3B MoE model outcompetes models ten times its activated size.

A cool new feature is the hybrid thinking mode, allowing models to either reason step-by-step for complex tasks or provide instant answers for simple ones. This gives users fine-grained control over computational budget and response speed.

They've massively expanded multilingual support, now covering an impressive 119 languages and dialects. This opens up global use cases significantly.

Pre-training data nearly doubled to 36 trillion tokens, incorporating diverse sources including synthetic data generated by previous Qwen models. This expanded dataset and improved methods contribute to the performance gains.

The lineup includes both open-weighted MoE models (235B-A22B, 30B-A3B) and six dense models from 0.6B to 32B. The efficiency of these new models could dramatically lower inference costs.

Post-training involved a four-stage pipeline, including long CoT training and reinforcement learning, to achieve the hybrid thinking and robust capabilities. Agentic capabilities are also enhanced, with a dedicated Qwen-Agent tool recommended.

It feels like we're moving closer to training capable agents, not just models. This release seems like a major step towards AGI, making advanced models more accessible and efficient.

Full Infographics From Text

Forget simple text-to-image; generate business content from text is the next frontier, and BizGen is leading the charge. This new system tackles the challenging task of creating entire infographics and slide decks directly from an article-level prompt and a detailed layout.

Previous models often struggled with more than a few text regions or handling longer inputs. BizGen specifically addresses the complexity of adhering to ultra-dense layouts containing dozens or even hundreds of text blocks and images, all driven by a lengthy article's context.

The core technical contributions enabling this are a massive new dataset of high-quality business visuals with dense layouts, and a clever "layout-guided cross attention" mechanism. This system can precisely inject text prompts into specific regions and refine individual areas flexibly during the generation process.

BizGen demonstrates strong results, significantly outperforming existing state-of-the-art models like Flux and Stable Diffusion 3 on complex business content generation tasks according to their evaluations.

By releasing the BizGen model, the large-scale Infographics-650K dataset, and the BizEval prompt set, the researchers are encouraging the broader community to advance progress in this exciting area of visual text rendering for professional applications.

AI Clones You

Imagine generating endless photos of yourself, perfectly matching any description, while looking exactly like you. Create Infinite Photographs of You with this new AI framework.

This isn't just simple face-swapping; it uses powerful Diffusion Transformers (DiTs) like FLUX. Previous attempts struggled to keep your face consistent across different scenes or follow text instructions accurately.

Called InfiniteYou (InfU), it's one of the first robust frameworks leveraging DiTs for this task. Its core component, InfuseNet, cleverly injects identity features directly into the AI model using residual connections, like giving the AI a specific face to draw from every time.

A multi-stage training strategy, including using synthetic data, helps InfU learn your look from just a few photos and apply it realistically to entirely new scenes and styles. This fixes common issues like blurry faces, distorted features, or even "face copy-pasting" seen in other methods.

The results show significantly better identity similarity and text-image alignment compared to state-of-the-art baselines, while also improving overall image quality and aesthetics.

Plus, it's designed to be plug-and-play, compatible with many existing tools like ControlNet and LoRA, offering exciting possibilities for customized creative tasks.

Improve Video Without Retraining

Improving video generation usually means expensive retraining on massive datasets. But what if you could get dramatically better results after the model is trained, just by using more computation during inference? This work shows how to explore the power of Test-Time Scaling (TTS) for video, borrowing a powerful idea from Large Language Models.

They reframe the video generation process at test time as a search problem, looking for better starting points in the noisy input space. Think of it like trying multiple creative ideas until you find the one that produces the best video. Test-time verifiers provide feedback to guide this search process.

They propose two main strategies: a straightforward random linear search that tries multiple starting noises, and a more efficient "Tree-of-Frames" (ToF) approach. ToF builds the video step-by-step, adaptively exploring and pruning possibilities like growing a tree, first focusing on initial image quality, then refining motion and temporal consistency.

The results are quite impressive! Across different video models and evaluation metrics, consistently increasing test-time compute leads to significant improvements in video quality and prompt adherence. It's genuinely exciting how much more potential can be unlocked from existing models simply by scaling inference.

While TTS provides broad improvements, the paper notes that some highly complex aspects like perfectly smooth, precise motion or eliminating subtle temporal flickering are still challenging. These seem more dependent on the fundamental capabilities of the base generation model itself.

8.5x Faster Video AI

Video diffusion models are amazing at generating high-quality video, but their step-by-step denoising process makes them notoriously slow. Researchers have introduced AccVideo, this 8.5x faster video diffusion model that aims to solve this speed bottleneck.

The core idea is distillation, training a faster "student" model to mimic a slower "teacher". AccVideo uniquely leverages the teacher model itself to create a high-quality synthetic dataset, capturing essential denoising steps without including less useful intermediate data.

Based on this specialized dataset, they developed a trajectory-based few-step guidance. This method focuses the student model on learning the most critical data points from the teacher's denoising path, allowing it to generate video in significantly fewer steps.

Furthermore, AccVideo incorporates an adversarial training strategy. This helps align the output quality and distribution of the faster student model with the rich data captured in their synthetic dataset, ensuring the speedup doesn't come at the cost of quality.

The results are impressive: AccVideo achieves an 8.5x speed improvement over the teacher model. It's capable of generating 5-second, 720x1280 videos at 24fps with quality comparable to much slower methods, surpassing previous acceleration techniques in both speed and output resolution.

Video AI Does Everything

Imagine a single AI model capable of handling virtually any video creation or editing task you can think of. That's the promise behind VACE, this all-in-one video AI from Tongyi Lab, aiming to revolutionize how we work with video.

VACE offers a suite of "Composite Anything" capabilities, meaning it can generate new video content and perform complex edits. This includes features like moving objects, swapping elements, referencing styles, expanding scenes, and animating content all within one unified framework.

Beyond generation, VACE excels at "Video Rerender." This allows users to transform existing videos while preserving crucial elements like the original content, structure, subject, posture, or motion. It's like giving your video a complete makeover while keeping its core identity intact.

By consolidating diverse video tasks into a single model, VACE significantly streamlines workflows, opening up exciting possibilities for creators and editors looking for efficiency and creative flexibility.

Motion Segmentation No Labels

Imagine needing to perfectly outline every moving thing in a video, frame by frame. Traditionally, that requires tedious manual labeling. But new research introduces a novel way to segment video motion that bypasses this painstaking process entirely.

This breakthrough approach combines deep motion cues extracted from trajectories with powerful semantic understanding from models like DINO. It then leverages the impressive pixel-level capabilities of SAM2, prompting it iteratively with the detected motion paths to generate incredibly precise masks.

It's not just about throwing different models together. The system uses clever techniques like analyzing long-range trajectories and separating motion and semantic information in a unique way. This prevents common pitfalls, like mistaking static backgrounds for movement or assuming all objects of the same type move identically.

The results are genuinely impressive, achieving state-of-the-art performance even on complex, real-world videos. It demonstrates remarkable generalization, accurately segmenting diverse moving objects in challenging scenes without ever having seen specific examples with manual outlines.

Talk Turns Into Movies

Imagine creating movie scenes, complete with talking characters, just by providing audio and text. MoCha makes movies from speech and text, a revolutionary step towards dialogue-driven film generation.

This model synthesizes entire talking characters and movie shots using only the spoken dialogue and its corresponding script. It translates the nuances of speech and text into character animation, lip-sync, and visual context.

MoCha offers sophisticated control, enabling generation with specific emotions and actions, moving beyond basic talking head videos. It can handle multiple characters and even complex turn-based conversations within a single scene.

Developed by researchers at Meta and the University of Waterloo, the project aims for "movie-grade" output quality. They've also released a specialized benchmark, MoChaBench, to evaluate progress in this novel area.

The videos on their project page showcase a wide range of generated scenes, from close-ups with emotion control to multi-character interactions, demonstrating the potential of this approach.

Zero-Shot DiT Control

The Diffusion Transformer, often seen as less flexible for control than Unet models, just got a major upgrade with this new DiT control framework. It promises to bring efficient, flexible control capabilities previously difficult on this architecture.

Dubbed EasyControl, this novel framework tackles the challenge head-on. It aims to unify condition-guided diffusion transformers, making them as versatile as their Unet counterparts while boosting performance.

A core innovation is the Condition Injection LoRA Module. This lightweight, plug-and-play component adds conditional signals without touching the base model. Astonishingly, it supports robust multi-condition generalization even when trained only on individual conditions.

Another key is the Position-Aware Training Paradigm. This smart approach standardizes condition inputs, allowing the model to generate images at arbitrary aspect ratios and resolutions, significantly improving flexibility and efficiency.

Finally, a Causal Attention Mechanism combined with KV Cache dramatically speeds up image synthesis. This innovation slashes latency, making the framework much more practical for real-time applications.

Collectively, these features make EasyControl a powerful and flexible solution for DiT-based image generation, opening up new possibilities for controlled creativity.

Your Photo Acts

Get ready to see animating human images like never before with the new DreamActor-M1 framework. While existing methods can animate bodies and faces, they often struggle with fine-grained control, different scales, and keeping the video consistent over time.

This new diffusion transformer (DiT) based approach uses what they call "hybrid guidance" to tackle these issues head-on. It provides detailed control over everything from subtle facial twitches to complex full-body movements.

For motion, DreamActor-M1 combines information from body skeletons, head spheres, and even implicit facial representations for truly expressive control. It also handles various image scales, from close-up portraits to full-body shots, thanks to a smart progressive training strategy.

Crucially, it ensures the animation stays consistent and identity-preserved throughout, even when parts of the person are temporarily hidden during movement. This robust system can even do things like audio-driven lip-sync in multiple languages or transfer just specific parts of a motion.

The results shown are impressive, outperforming current methods and delivering high-fidelity, temporally coherent animations that capture fine-grained motion and preserve identity across different styles and scales.

Image to High-Fidelity 3D

Creating detailed 3D models from just a single image has always been tricky, but a new method called Hi3DGen is changing that. They've found a novel way for generating high-fidelity 3D geometry that captures intricate details often lost by other techniques.

The core problem is that standard RGB images don't provide enough direct depth information, leading to blurry or inaccurate geometry, especially for fine features. Existing methods often struggle with the "domain gap" between 2D images and 3D shapes.

Hi3DGen tackles this by using normal maps as a crucial intermediate step, acting like a "bridge." First, a specialized estimator extracts sharp normal maps from the input image, then a normal-to-geometry diffusion model uses these normals to guide the creation of the 3D shape.

This "normal bridging" approach, powered by techniques like Noise-injected Regressive Normal Estimation (NiRNE) and Normal-Regularized Latent Diffusion (NoRLD), is trained on a high-quality synthetic dataset called DetailVerse.

Extensive experiments show that Hi3DGen significantly outperforms previous state-of-the-art methods in geometric fidelity, delivering richer and more accurate 3D models from single images.

Audio From Any Input

Imagine turning a video or even an image directly into sound AudioX makes audio from any input. This new AI model is breaking down the barriers between different types of media.

Unlike previous models that only did text-to-audio or music, AudioX is a single system that can handle everything. You can feed it text, video, images, existing music, or audio, and it generates high-quality audio or music.

At its heart is a Diffusion Transformer trained with a clever multi-modal masking strategy. This forces the model to learn robust connections between different input types, enabling its unique flexibility.

To tackle the lack of diverse data, the researchers built massive new datasets linking audio/music with captions. The result? AudioX not only keeps up with specialized models but offers unprecedented versatility in one package. It's a genuinely exciting step towards truly unified creative AI tools.

Diffusion Adds Camera Blur

Text-to-image AI can now achieve realistic camera blur control. Forget trying to describe "blurry background" in prompts and hoping for the best!

Current diffusion models struggle to replicate photographic effects like depth-of-field using just text prompts. This often leads to inaccurate blur that changes the scene content unintentionally.

A new method called Bokeh Diffusion solves this by explicitly conditioning the model on a physical defocus blur parameter, just like adjusting a real camera lens.

The key involves a clever hybrid dataset combining real photos with synthetic blur, injecting a specific blur value (0-30) into the model, and using a reference image to keep the scene consistent.

This approach allows for flexible, lens-like blur control that maintains the original scene structure. It can even be used to edit the blur level in existing real images.

Single Image Animates Humans

Imagine turning any single photo into a fully animatable 3D model in just seconds. That's exactly what turning a single photo into 3D promises, tackling a notoriously difficult problem in computer graphics.

Getting a realistic, moving 3D human from just one image is tough because you have to figure out the shape, look, and how it moves all at once. Older methods were either static or needed complex video setups and lots of processing time.

Called LHM, this new "Large Animatable Human Reconstruction Model" uses a clever multimodal transformer. Think of it like an AI that can look at the photo and understand both the body's position and the image details simultaneously, creating a detailed 3D 'Gaussian splatting' representation.

The model is fast, working in a single pass, and captures incredible detail like clothing folds and textures. It even has a special trick, a 'head feature pyramid', specifically designed to make sure faces look right and keep fine details sharp without extra steps.

The results are genuinely impressive, generating high-fidelity, animatable avatars faster and more accurately than previous approaches. This feels like a significant leap forward for creating digital humans from simple inputs.

Super-Resolution Gets Physics

A new technique called Thera is making waves in image upscaling, offering aliasing-free arbitrary-scale super-resolution. What sets it apart is its revolutionary built-in physical observation model.

Traditional super-resolution often introduces unwanted artifacts like jagged edges or shimmering patterns, especially when scaling by unusual factors. Thera tackles this head-on by modeling the image data as continuous "heat fields".

At its core, Thera uses a hypernetwork to predict parameters for these local neural heat fields. This allows it to generate sharp, detailed images at any desired resolution, correctly handling anti-aliasing by design and guaranteeing quality results.

The results speak for themselves, with Thera achieving state-of-the-art performance across various benchmarks, as demonstrated in their qualitative comparisons and quantitative tables.

This principled approach could be a game-changer for applications requiring flexible, high-quality image scaling without compromising integrity. You can even try it out via their demo or explore the code!

AI Creates Complex SVG

Generating high-quality, complex SVG images has been a challenge, often limited to simple designs. But a new model is unleashing complex SVG generation, capable of creating intricate vector graphics like never before.

Called OmniSVG, this unified framework tackles the problem by leveraging powerful pre-trained Vision-Language Models. It breaks down SVG commands and coordinates into simple tokens, smartly separating the drawing logic from the precise geometry.

This approach allows OmniSVG to handle a wide spectrum of visual complexity, moving beyond basic icons to generate detailed illustrations and even anime characters. It's incredibly versatile, working from text descriptions, reference images, or even character prompts.

Compared to previous methods, OmniSVG delivers superior results and opens the door for integrating advanced AI into professional design workflows. They've also released a massive 2-million asset dataset, MMSVG-2M, to push the field forward.

Less Data, More Control

The world of AI image generation just got a major upgrade with UNO's less-to-more generalization approach. This new framework tackles the big challenge of creating images with multiple specific subjects, something previous methods struggled with.

Until now, generating images with more than one custom subject was tough getting enough data was hard, and most models only handled one subject at a time. UNO solves this by using a clever pipeline that generates highly consistent multi-subject data itself, leveraging the in-context power of newer diffusion models.

At its core, UNO is built on two key ideas: a two-stage training process (starting with single subjects, then adding multiple) and a unique component called Universal Rotary Position Embedding (UnoPE). UnoPE is crucial for keeping things straight when you're trying to control multiple visual elements, preventing confusion between attributes.

This allows UNO to evolve from handling just one subject to seamlessly managing many, all within a single, universal model. The results demonstrate impressive consistency and control across a range of tasks, showing its potential to unify diverse generation needs.

Claude x OpenAI Top Agent

Augment has claimed the top spot on SWE-Bench Verified for open-source agents, achieving a 65.4% success rate by combining Claude 3.7 and O1. This surprising pairing of models from different leading AI labs demonstrates a novel approach to pushing agent performance boundaries.

SWE-Bench is the industry standard benchmark for coding agents, testing their ability to handle real-world software engineering tasks from GitHub issues. It requires agents to navigate codebases, find relevant tests, write reproduction scripts, and apply fixes.

Augment's core agent driver was Anthropic's Claude Sonnet 3.7, known for its strong code capabilities. They adapted Anthropics methods, including figuring out a planning tool equivalent using sequential thinking.

Notably, they found OpenAIs O1 model was more effective than Claude at ensembling candidate solutions. This simple majority voting technique provided a valuable 3-8% score improvement, though it's currently too expensive for practical use.

While SWE-Bench is a crucial research tool, Augment highlights its limitations compared to real-world development. The benchmark focuses on small bug fixes in Python and doesn't capture collaboration, external tools, or the complexity of large production codebases.

Their key learning is that while prompt tuning helps, performance is primarily driven by the foundation model quality. Ensembling offers a boost but cost remains a major barrier for deployment.

Augment's future focus is on fine-tuning their own models with reinforcement learning and proprietary data. The goal is to create significantly faster and cheaper agents for real-world applications, maintaining high performance while improving user experience.

AI Leaderboard Rigged

The leading AI leaderboard, Chatbot Arena, might not be the objective measure it appears. The Leaderboard Illusion paper reveals systematic issues distorting rankings.

The authors found that some providers engage in undisclosed private testing, testing multiple variants before releasing the best one. They can even retract poor scores, creating a biased playing field through selective disclosure. For example, Meta tested 27 private LLM variants leading up to Llama-4.

Proprietary models from companies like Google and OpenAI are sampled far more frequently than open-weight alternatives. They also have fewer models removed. This results in massive data access advantages, with Google and OpenAI estimated to receive around 20% each of all Arena data, while 83 open models share less than 30%.

Access to this Arena data is incredibly valuable. The paper estimates that even limited additional data can boost a model's relative performance on the Arena distribution by up to 112%. This encourages models to overfit to the Arena's specific dynamics rather than improving general capabilities.

While acknowledging the Arena's value, the paper offers concrete recommendations to reform its evaluation framework, aiming for fairer and more transparent benchmarking in the AI field.

AI Learns From Nothing

Imagine an AI learning to code and solve math problems at a state-of-the-art level without any external data. This groundbreaking approach challenges the conventional wisdom that massive human-curated datasets are essential for advanced reasoning.

Current methods like Reinforced Learning with Verifiable Rewards (RLVR) still lean heavily on human-provided questions and answers. This reliance creates a bottleneck for scalability and might even limit AI learning potential in the future.

Enter "Absolute Zero," a new RLVR paradigm where a single model learns by teaching itself. The Absolute Zero Reasoner (AZR) generates its own tasks and uses a code executor as its "teacher" to check solutions and provide feedback.

The code executor acts as a universal source of verifiable reward, validating proposed coding tasks and verifying the answers the model comes up with. This allows for open-ended yet grounded learning entirely through self-play.

The most striking result is that AZR, trained completely from scratch without any human examples, achieves overall state-of-the-art performance on coding and mathematical reasoning benchmarks. It even outperforms existing "zero-setting" models that relied on tens of thousands of curated human examples.

This self-supervised approach isn't limited to a specific setup; the paper demonstrates that AZR is effective across different model sizes and works with various model architectures.

Document Search Without OCR

How do you search documents without reading them? This project presents ColPali, a novel method showing how VLMs enable document search. By leveraging Vision Language Models (VLMs) like PaliGemma or Qwen2-VL, it creates multi-vector embeddings directly from document images.

This approach sidesteps the complexities and potential fragility of traditional OCR and layout analysis pipelines. Instead, it captures both the textual and visual informationlike charts or layoutusing a single model.

The core idea is taking the VLM's output image patches and projecting them into a multi-vector representation for each document page. Retrieval then works by finding documents whose embeddings are most similar to query embeddings, following the ColBERT architecture principles.

This leads to a more robust search capability that inherently understands the visual context. The project also includes features like similarity maps for interpretability and token pooling to reduce embedding size while maintaining performance.

AI Heads Talk Controlled

Creating realistic talking head videos has always been a challenge, often limited by using just one input like audio. But now, a breakthrough framework called ACTalker is here, letting you control AI talking heads with multiple signals. This means you can drive the video not just with speech, but simultaneously with facial motion, eye movement, and more!

At its core, ACTalker uses a clever parallel mamba structure. Think of it like giving different instructions (signals) to different parts of the face (regions) at the same time, through separate channels. A smart 'gate' mechanism helps blend these instructions smoothly, and a 'mask-drop' strategy ensures the controls don't clash, keeping everything coordinated and natural.

This unique approach allows for incredibly flexible and nuanced control over the generated video. Whether you want a head driven purely by audio, just by subtle facial twitches, or a complex combination of both, ACTalker handles it seamlessly. The results shown are strikingly lifelike, demonstrating the power of integrating diverse control signals without sacrificing naturalness.

AI Controls Virtual Camera

Get ready for a breakthrough in video generation! Stability AI is introducing Stable Virtual Camera, a new model that turns static 2D images into dynamic, immersive 3D videos. It achieves this with realistic depth and perspective, without needing complex 3D reconstructions first.

Unlike older methods that demand many input photos, this multi-view diffusion model can work with just one image or up to 32. It combines AI power with familiar virtual camera controls.

The real magic is the control: you can define custom camera paths or choose from 14 dynamic options like 360 spins, spirals, or dolly zooms. It can generate consistent videos up to 1000 frames, even revisiting viewpoints smoothly.

This technique, using a procedural two-pass sampling process, helps it handle variable input/output lengths and achieve state-of-the-art results in novel view synthesis benchmarks, beating other models in quality and smoothness.

It's not perfect yet it struggles with complex subjects like people or animals, and challenging scenes or paths can cause flickering.

Currently in research preview, it's available for non-commercial use. Researchers can find the paper, model weights on Hugging Face, and code on GitHub to start experimenting.

One Model All Images

A new AI model has just been released that promises to revolutionize image generation and editing. Lumina-mGPT 2.0 is a standalone, decoder-only model trained from scratch that can handle a remarkably broad spectrum of image tasks.

It unifies capabilities like text-to-image generation, image pair generation, subject-driven generation, multi-turn image editing, controllable generation, and even dense prediction within a single framework.

This is significant because, unlike many specialized models, Lumina-mGPT 2.0 learns the entire process end-to-end using an autoregressive approach. Being standalone means it doesn't rely on external components for different functions.

Despite its comprehensive nature, the developers have implemented acceleration strategies like Speculative Jacobi Decoding and quantization. These methods significantly reduce inference time and GPU memory requirements, making the model more accessible.

The 7B, 768px model checkpoints are available on Hugging Face, and the project provides code for quick inference. With finetuning code planned for release, this open-source model is poised to become a powerful tool for researchers and developers exploring unified image AI.

NotebookLM Languages

NotebookLM and NotebookLM Plus now have a powerful new language output feature for generating text. This goes beyond simple translation, letting you create documents and responses in any language you choose.

Imagine getting study guides or briefing documents generated directly in your preferred language, supporting over 35 options. While audio summaries are currently English-only, the written output flexibility is a game-changer for global teams and students.

They also added an interactive Mind Map to help visualize complex topics. It maps connections within your uploaded materials, like seeing "Ocean Acidification" linked to "Coral Reef Decline" in research papers, helping users explore new connections.

These updates arrive as NotebookLM becomes a core Google Workspace service for many business and education plans, rolling out starting March 19, 2025. Crucially, your data remains private and isn't used for model training, staying within your organization's trust boundary.

Invisible AI Backdoor

A new, disturbing vulnerability is turning your trusted AI coding assistant into an unwitting accomplice for hackers. This invisible AI backdoor hides malicious instructions in simple configuration files used by tools like GitHub Copilot and Cursor.

Called "Rules File Backdoor," the attack leverages hidden unicode characters and sophisticated prompts within these rule files to subtly influence the AI's code generation behavior. This manipulation is virtually undetectable by human reviewers or even the AI's own logs.

When you ask the AI to generate code, it unknowingly injects backdoors, insecure practices, or data exfiltration code into your project. This malicious code bypasses standard code reviews and silently propagates through your codebase and potentially the software supply chain.

The attack weaponizes the AI itself, exploiting the trust developers place in these tools. Its persistence in project repositories and the reliance on shared rule files create a stealthy, scalable threat that's difficult to detect using traditional security methods.

LLMs Are Braindead

Shocking new research suggests we've made no real progress toward AGI despite impressive LLM performance. Contrary to popular belief, these models might be nothing more than incredibly large statistical engines.

Recent work, including Anthropic's use of "attribution graphs," allows researchers to finally peek inside the "black box" of LLMs. What they found is surprising: the internal processes don't resemble human reasoning steps at all.

For example, when asked to do simple addition, the LLM uses a complex web of memorized patterns and approximations, not a clear arithmetic algorithm. Even more concerning, when asked to explain how it solved the problem, the LLM fabricates a human-like step-by-step process that doesn't match its actual internal activity.

This pattern of hallucinating reasoning extends to tool use; models will claim to use tools they didn't, or even invent fake results. This fundamental unreliability stems from their architecture as statistical models predicting patterns, not understanding underlying principles.

This means LLMs, while useful for pattern analysis, are inherently unreliable for tasks requiring true understanding or guaranteed correctness. They are data-bound and cannot generate genuinely new semantic information or handle infinite real-world edge cases reliably, suggesting current scaling efforts aren't leading to AGI.

Verify ChatGPT Facts

OpenAI for Business just announced a significant step towards trustworthy AI, unveiling improved citations in ChatGPT. This update directly addresses a common concern about verifying AI-generated information.

Now, ChatGPT can provide multiple sources for a single response, giving you more points of reference. They've also added a new highlight feature that visually connects specific parts of the answer to their corresponding citations.

This makes it much easier to learn more about a topic or quickly verify the facts presented. It's a practical improvement that boosts confidence in the information you receive, making it a real time-saver for tasks like research or analysis.

AI Learns While Sleeping

The future of AI isn't just thinking faster when you ask it to, it's about letting AI agents think when they're idle. This revolutionary concept, called sleep-time compute, allows models to deepen their understanding and process information during downtime.

Current AI models often perform complex reasoning only when a user is actively waiting known as "test-time scaling." While effective, this leaves vast amounts of potential compute power unused when the system is idle.

Sleep-time compute changes this paradigm. By using these idle periods to reorganize information and reason through available context, agents can create "learned context" in advance. This means they don't need to perform as much intensive reasoning later during active user interactions.

Crucially, this approach requires stateful AI agents that maintain memory over time. Letta, powered by MemGPT, provides the persistent memory layer that allows insights gained during sleep time to improve future performance, unlike stateless models where downtime work would be lost.

Letta's implementation uses a dual-agent system. A primary agent handles quick user interactions, while a separate sleep-time agent works asynchronously in the background, managing and refining the agent's core memory. You can even configure the sleep-time agent to use a larger, slower model for deeper processing.

This shift offers better cost and performance by moving computational load off the critical path. It enables continuous memory improvement, better personalization, and background tasks like asynchronous document analysis. This is a significant step towards AI systems that truly never stop learning.

Generalist AI Agent

Suna is a powerful how to self-host Suna, an open-source generalist AI agent designed to tackle real-world tasks directly on your behalf. Imagine an AI companion that can browse the web, manage files, and even execute commands in an isolated environment.

It comes equipped with a robust toolkit including seamless browser automation, file system access for document work, web scraping, and integration with various APIs. These capabilities allow Suna to automate complex workflows and solve problems through simple conversation.

The architecture involves a Python/FastAPI backend, a Next.js/React frontend, a secure Docker execution environment for agents (using Daytona), and a Supabase database for persistence. This structure ensures both power and secure isolation for the agent's actions.

From automating competitor analysis and finding potential job candidates on LinkedIn to planning company trips and summarizing scientific papers, Suna demonstrates impressive versatility. Its ability to act directly makes it a game-changer for automating diverse personal and professional tasks.

Expensive Prompts

What if you could see exactly what makes top AI agents tick? A new repository reveals the internal AI agent system prompts and tools that power agents like Devin, Cursor, and v0. This isn't just theory; it's over 6,500 lines of their actual internal instructions.

Getting access to these prompts is like finding the instruction manual for advanced AI brains. It shows the specific rules, constraints, and tool definitions that guide their behavior and capabilities.

The collection includes system prompts for a wide range of tools, from coding agents like Devin and Cursor to others like v0, Manus, and Replit Agent. It offers a unique look into how these sophisticated AIs are structured and directed.

This project also highlights the significance of these prompts by including a security notice for AI startups, emphasizing the importance of protecting such valuable internal configurations from leaks.

But these prompts are technically worth billions.

AI works, but we don't know why

AI has rapidly transformed from an academic pursuit into a global force, yet we've built incredibly powerful systems whose inner workings we don't understand. Dario Amodei argues for the urgent need to understand AI's inner workings before models reach overwhelming power.

Unlike traditional software built with explicit instructions, generative AI is "grown," with cognitive mechanisms emerging organically from training data and architecture. This makes their exact behavior unpredictable and difficult to explain, much like the precise shape of a plant or bacterial colony.

This opacity is the root cause of many AI risks. It hinders our ability to predict or rule out misaligned behaviors like deception or power-seeking, makes it hard to detect subtle issues (external behavior isn't reliable, akin to asking a terrorist if they are one), and complicates efforts to prevent misuse.

Lack of interpretability also prevents AI adoption in high-stakes fields like finance or safety-critical systems, where explainability is legally or practically required. Even in science, understanding AI-predicted patterns is limited.

The field of mechanistic interpretability aims to solve this by creating an "MRI" for AI. Recent breakthroughs in identifying "features" (concepts) and tracing "circuits" (chains of thought) inside models are showing a realistic path towards understanding their internal computation.

This understanding provides practical utility. Interpretability tools can diagnose problems and find flaws in models, acting as an independent check on alignment, similar to how a doctor uses an MRI to diagnose and monitor treatment.

However, AI capabilities are advancing faster than interpretability. Amodei stresses that it is unacceptable to deploy future "country of geniuses" level AI without this crucial understanding.

Accelerating interpretability research across all sectors, encouraging its development through light-touch government regulations, and using export controls to gain time are crucial steps in this race. We must understand our creations before they radically reshape humanity.

Emotional Advice From AI

Anthropic is deeply researching the real-world societal impacts of AI, exploring how it's already woven into daily life from the mundane to the profound. Their team is observing unexpected uses, like people turning to AI for relationship counseling, parenting advice, and interpreting dreams. This highlights a surprising trend of users sharing intimate details and seeking emotional support from machines, even while acknowledging AI fundamentally lacks empathy.

The research also delves into AI's economic effects, noting clear signs of automation in usage data. They are working to identify which jobs and tasks are being automated, emphasizing that understanding these changes is crucial for society to have a say in the future of work. AI systems demonstrate capabilities far beyond human limits, such as processing vast amounts of text instantly, and the rise of autonomous AI agents performing complex multi-step tasks promises even greater economic shifts.

Beyond practical applications, the video touches on profound implications, including the possibility that humans might start reflecting AI in their own behavior, like writing code optimized for AI models. The core question remains how to ensure this new form of intelligence operates in humanity's best interests, especially as AI learns from the full spectrum of human knowledge, both positive and negative. Anthropic stresses the importance of measurement and transparency, sharing their findings so the public can contribute to shaping AI's future.

Easily De-censor Video

You might think pixelating or blurring video keeps sensitive information safe, but a recent challenge proved just how easy it is to reverse. Jeff Geerling offered $50 to anyone who could de-censor a pixelated folder name in one of his videos. Within 24 hours, three people cracked it using different techniques it's scary how fast they succeeded!

One successful method, detailed on GitHub, used a brute-force approach. Imagine looking through shutters at a picture that's moving behind them; by observing different parts through the gaps over time, you can piece together the hidden image.

More advanced techniques automated this data collection, using tools like Python and FFmpeg to precisely pull pixel data from the moving censored area. Computers are incredibly good at finding order in seeming chaos, much like how modern tools can pull a clean voice out of a noisy recording. The more motion in the censored area, the more data points are available, leading to a higher confidence result.

Years ago, this kind of de-pixelation would have required supercomputers and PhDs. But thanks to AI assistance and how fast neural networks run today, it's now significantly easier and faster than ever before.

The key vulnerability is the movement of the censored area, which leaks crucial data. To truly prevent de-censoring, the video suggests using a solid, pure color mask over the sensitive area instead of blur or pixelation effects.

Windsurf Goes Free Premium

Windsurf's free plan just received a massive upgrade, bringing powerful premium AI features to everyone. Users now get 25 premium prompt credits each month, effectively allowing up to 100 prompts with discounted models like GPT-4.1 and o4-mini.

Previously paid features are now available for free users, including unlimited usage of Cascade Base and unlimited fast tab completions. This means a full agent experience and the highest performing tab experience are no longer gated.

Other additions include previews and one deploy per day. The core idea is that while paying users get more, the free plan now offers a truly meaningful premium AI experience, letting anyone build with these advanced tools.

AI Cheats Everything

This video introduces Cluely, an AI tool explicitly designed to help users "cheat" in real-world situations like interviews, exams, sales calls, and meetings. The core idea is leveraging AI to get real-time answers and advantages.

The tool works by seeing your screen and hearing your audio, feeding you information instantly. The creators argue this isn't traditional cheating but rather the next evolution of technology making us smarter, comparing it to tools like calculators, spellcheck, and Google.

They suggest that while the world might initially resist, technology that enhances capability eventually becomes normal. They believe AI isn't just a tool but something that will redefine how we work and succeed.

The video posits that the future won't just reward effort but leverage, making the ability to use AI effectively the most valuable skill. It's framed as embracing the future by using powerful new aids.

Meta AI Predicts Concepts

Meta AI is pushing boundaries with a wild new approach to how models "think". Forget just predicting the next word; they're training models to predict abstract concepts too.

This new method, called Large Concept Models (LCM) or Coco Mix, integrates Sparse Autoencoders (SAE) directly into the training process. SAEs learn the underlying concepts a model has, moving from just an analysis tool to a core training component.

The model is trained not only to guess the next word but also the next concept simultaneously. These predicted concepts then actively guide the next word predictions, creating a continuous, conceptually grounded generation process. Think of it like building a structurally sound house using steel beams (concepts) to guide the placement of bricks (words).

This approach saves significant training compute while maintaining performance. It also opens doors for better multilingual and multimodal models, and could potentially replace context window instructions because the concepts "stick" with the model. It even enables training larger models using concepts extracted from smaller ones.

It's surprising to see Meta leading this charge with SAEs, as Anthropic has been prominent in that area. This simple yet fundamental shift in early model training could be a major step towards more robust reasoning and multimodal capabilities.

The Grid

Svelte Needs Less Code

Ever wondered how the same component feature is built in React, Svelte, Vue, and others? A new site lets you compare frontend framework syntax side-by-side for common patterns like state management, loops, and form inputs.

Looking at the examples, particularly React versus Svelte 5, the differences in boilerplate and approach are striking. Svelte 5 often achieves the same result with significantly less code, using features like $state, $derived, and bind: for reactivity and two-way binding.

React, on the other hand, relies heavily on hooks like useState, useEffect, and useRef, often requiring more explicit setup for simple tasks. Seeing the code side-by-side makes these architectural differences incredibly clear.

It's fascinating to see how different teams tackle the same fundamental problems in component development. This kind of direct comparison can be eye-opening for choosing a framework or just appreciating the diversity of the frontend landscape.

React Builds Native UI

Get ready to build truly native UIs in React using Expo's new @expo/ui library. This is a game-changer for developers wanting deeper platform integration.

@expo/ui provides components built directly with SwiftUI on iOS and Jetpack Compose on Android. This means you can leverage the latest native UI paradigms and performance characteristics.

Think of it as bringing the power of modern native UI frameworks right into your familiar React environment. You get access to components like BottomSheets, Pickers, TextInputs, and more, rendered using the platform's preferred toolkit.

Keep in mind this library is currently in alpha and requires using development builds of your Expo app. It's an exciting step towards bridging the gap between cross-platform development and full native UI fidelity.

Google Leaked Design

Google accidentally spilled the beans on its upcoming Material 3 Expressive design update, leaking it ahead of Google I/O 2025. This isn't just a minor tweak; they're calling it their boldest and most researched design shift yet.

The core idea is moving away from interfaces that feel like "sterile white rectangles" towards something more vibrant, emotional, and human. Google felt everything looked the same and lacked personality, sparking this deep dive into expressive design starting back in 2022.

Extensive research, including eye-tracking, focus groups, and speed tests, showed that expressive design simply works better. It's about using rich colors, playful shapes, and dynamic motion to make layouts feel alive and more intuitive.

The goal is for interfaces to have a feeling whether calm, focused, or energized making apps not just functional but genuinely enjoyable. Users overwhelmingly preferred this style, finding it faster and easier to use because key elements pop out when needed, guiding your eyes and brain more quickly.

Even the detailed guidelines page for Material 3 Expressive briefly went live before vanishing, confirming the direction. While we'll get the full story at Google I/O, this early leak suggests big, much-needed changes are coming to how our apps look and feel.

Dropdown Becomes Drawer

Mobile menus often feel clunky and out of place, but now there's a responsive dropdown and drawer component that solves this elegantly. It automatically knows whether you're on a desktop or a phone.

On larger screens, you get the familiar dropdown menu experience you expect. But switch to mobile, and it seamlessly transforms into a native-feeling drawer sliding in from the side.

Built specifically for users of shadcn/ui, this component is designed as a direct drop-in replacement for their existing DropdownMenu. You can swap them out in your code with minimal changes.

This smart switching happens thanks to a configurable breakpoint (defaulting to 768px). It leverages popular libraries like Vaul and Radix UI under the hood to provide smooth transitions.

The API mirrors the original DropdownMenu structure, making migration incredibly straightforward. It's a clever and practical solution for ensuring a great user experience across all devices.

Performance Web Animations

Achieving smooth, responsive web animations can be tricky, often impacting performance. But a new project is showing how to build these performance web animations effectively.

No More Effect Deps

Get ready for a major shift in how you write React Effects! The latest React Labs update confirms that the React Compiler is poised to introduce React's automatic effect dependencies, potentially making dependency arrays a thing of the past for many use cases.

This isn't just a convenience; it aims to fundamentally change the mental model for writing Effects. Instead of thinking about component lifecycles and manually tracking every dependency, the compiler infers what your Effect relies on and inserts the dependencies automatically, pushing you towards thinking about what the Effect does.

The update also introduces two significant experimental features ready for testing: View Transitions, which leverage the browser API for smoother UI animations on navigation, list reordering, and Suspense boundaries, and Activity, a new component to preserve UI state and enable pre-rendering by "hiding" parts of the tree without unmounting them entirely.

Beyond these, React is actively researching improvements in performance profiling, compiler tooling integration (like an IDE extension), Fragment Refs for managing multiple DOM elements, gesture animations, and a new primitive for Concurrent Stores using the use hook. It's an exciting look at the future of React development!

Retro UI From 2025

Looking for a unique look for your next project? Explore this retro-styled UI library for modern web applications. RetroUI offers a distinct vintage aesthetic built specifically for today's development workflows.

It leverages the utility-first approach of Tailwind CSS, making it easy to integrate into existing modern web stacks. This combination provides both style and flexibility.

The project has already garnered significant community interest, boasting hundreds of stars and forks. Detailed documentation and a contributing guide are readily available for developers looking to get involved.

Released under the permissive MIT license, RetroUI is free for use in your projects. Development appears very active, with the latest activity recorded as recently as May 2025.

The Spotlight

Cross-Platform AirDrop Exists

What if you could share files between your iPhone, Windows PC, and Android tablet as easily as AirDrop? Well, this open-source AirDrop alternative makes it a reality.

Forget uploading to cloud servers or dealing with cables. LocalSend transfers files directly between nearby devices over your local network, making it completely peer-to-peer.

It's not just convenient; it's built with privacy and security in mind. End-to-end encryption ensures your files are safe, and being free and open-source means no hidden tracking or costs.

Setting it up is a breeze no accounts needed. Devices on the same network find each other automatically. It genuinely works on Windows, macOS, Linux, Android, and iOS, breaking down platform barriers for good.

New Mac Dock

Revolutionize your macOS dock experience with DockFix, an app designed to unlock customization and add powerful features you didn't know you needed. It goes way beyond what Apple allows natively.

Whether you prefer enhancing the native dock using hidden settings or opting for a sleek, custom replacement, DockFix provides the tools. You get control over everything from colors and animations to opacity.

Key features include notification badges on any app, applying custom icons even to built-in macOS applications, and a handy temporary file shelf for quick transfers. You can also add folders and website shortcuts directly to the dock.

Beyond functionality, personalization is huge; choose from macOS, Windows, or Linux-style presets, or fine-tune every detail. There's even a community where you can share and discover unique dock designs.

Recent updates boast significant CPU usage reduction, ensuring smooth performance. Plus, students can benefit from a 50% discount.

Send 99TB Instantly

Tired of waiting for massive files to upload and download? A new app called Blip is changing the game with direct file transfers with no limits.

Unlike traditional web services or cloud storage, Blip lets you send files directly from your desktop to the recipient's device. This bypasses the slow upload step entirely, making transfers significantly faster, especially over your local network.

We're talking unlimited size seriously, they mention 99TB is fine. Plus, it's smart enough to resume transfers automatically if your connection drops or your drive gets unplugged.

It handles folders without zipping, works seamlessly between Windows, Mac, Android, and iPhone, and keeps everything private with in-transit encryption. It's free for non-commercial use, too.

Creatives especially love it for sending huge video and audio project files quickly and easily. It's a simple, fast, and reliable way to get your largest files wherever they need to go.

No more Docker, hi OrbStack

Tired of slow, resource-hungry containers and VMs on your Mac? Developers are discovering say goodbye to slow containers with OrbStack, a fast, light, and easy alternative to Docker Desktop designed to run Docker containers and Linux seamlessly.

It boasts lightning-fast startup, turbocharged networking, and incredibly low resource usage we're talking less than 0.1% background CPU on Apple Silicon and minimal disk space out of the box. Benchmarks highlight dramatic speedups, like cutting development environment provisioning time by more than half compared to Docker Desktop.

Beyond just speed, OrbStack is designed for simplicity. It's a true drop-in replacement for Docker Desktop, offering effortless integration with your existing CLI tools, seamless file sharing, and remote SSH access for Linux machines, making complex setups feel unbelievably simple.

It's also endlessly capable, supporting containers, Kubernetes, and a wide range of Linux distributions with robust connectivity features like painlessly handling IPv6 and VPNs. Features like Rosetta x86 emulation and optimized Apple Silicon support ensure smooth performance across different architectures.

If you're looking for a container solution that respects your machine's resources while providing powerful features and a smooth development workflow, OrbStack seems like a compelling choice.

Insomnia founder's next

The creator of Insomnia is back with a new API client called Yaak, and the Insomnia founder's new project is already turning heads. Yaak is built from the ground up for modern developers, focusing on speed, privacy, and flexibility.

Unlike many cloud-first tools, Yaak is proudly offline-first and designed to be Git friendly. It stores your API requests and data as plain-text files, making it easy to track changes, branch, merge, and collaborate using your existing Git workflows.

It's engineered for performance using Rust and ReactJS, offering a fast and responsive experience. Yaak supports a wide range of protocols including HTTP, GraphQL, WebSockets, SSE, and gRPC all within a single workspace.

Yaak handles complex authentication methods like OAuth 2.0 automatically and provides powerful features like dynamic variables, request chaining, and multi-window support for multitasking. It also integrates easily by importing from Postman, Insomnia, OpenAPI, and Curl. Developers are praising its clean interface and calling it a faster, less bloated alternative to older tools.

Anyone Can Write Tests

Maestro is redefining mobile and web end-to-end testing, promising teams can "Move fast, break nothing" with a platform so intuitive that anyone can write tests with AI help. This single framework eliminates the need for multiple tools by supporting a wide range of platforms including iOS, Android, Web, React Native, Flutter, and many more.

The secret sauce is Maestro Studio. It empowers non-technical users by letting them visually explore app elements, record interactions to instantly generate test commands, and even chat with MaestroGPT, an AI assistant trained specifically on Maestro.

Beyond test creation, Maestro integrates seamlessly into your CI/CD pipeline, enabling shift-left testing to catch issues early in the development cycle. You get automated notifications, detailed test results, and clear performance metrics.

For speed and reliability, Maestro offers enterprise-grade cloud infrastructure, allowing you to run tests in parallel. In today's fast-paced development environment, Maestro helps teams maintain high quality by making robust testing accessible and efficient for everyone.

Python Toolchain 100x Faster

A new tool is shaking up Python package management, promising an incredibly fast Python toolchain. Written in Rust, uv is reported to be 10 to 100 times faster than pip for common operations like dependency resolution and installation.

This ambitious project aims to be a single replacement for popular tools like pip, pip-tools, pipx, poetry, pyenv, twine, and virtualenv. It provides integrated functionality for managing projects, running scripts, installing command-line tools, and even handling Python versions.

uv supports modern project management workflows, including universal lockfiles and Cargo-style workspaces. It also offers a pip-compatible interface, meaning you can potentially swap uv pip for your existing pip commands and immediately see performance gains.

With a global cache for efficiency and support across major operating systems, uv is positioned as a major step forward in the Python ecosystem, backed by the creators of the popular Ruff linter.

Groq: AI Gets Tools

Groq, known for its incredible speed, just unveiled Groq's new Compound AI system now available in preview. This isn't just another language model; it's designed to take action by using tools like web search and code execution.

Unlike traditional LLMs limited to their training data, this system can access real-time information and perform live computations. It autonomously decides when and how to use these tools, even running them multiple times within a single query before providing a response.

Critically, all the tool execution happens server-side. This bypasses the need for complex client orchestration, keeps latency low, and allows Groq to optimize performance end-to-end.

The system is powered by openly available models, leveraging Llama 4 Scout for core reasoning and Llama 3.3 70B for routing and tool selection. You're charged based on the token usage of these underlying models.

Developers can use this for AI agents, smarter assistants, and research tools that require up-to-date information or validation. Imagine getting real-time crypto prices or top trending news directly from the model.

Groq introduced a new benchmark, RealtimeEval, specifically for evaluating systems on current events and live data. On this benchmark, the Compound Beta outperformed search-enabled versions of GPT-4o and performed on par with Perplexity Sonar.

It's available in two versions: compound-beta supports multiple tool calls, while compound-beta-mini is faster for single-call tasks. You can start experimenting simply by changing the model string in your API calls. You can also try it out code-free via the Groq Desktop Beta. Groq is actively seeking feedback to refine the system.

AI Learns Reality

NVIDIA Cosmos introduces world foundation models designed to power the next generation of physical AI for robots and autonomous vehicles. This platform provides state-of-the-art generative models trained on massive amounts of real-world robotics and driving data.

These world models are built to understand and generate environments grounded in physics, offering capabilities like prediction, controllable world generation, and reasoning. Imagine an AI that doesn't just see the world, but can accurately predict how objects will move or generate realistic scenarios for training.

Cosmos isn't just models; it's a complete platform. It includes advanced tokenizers, accelerated data processing pipelines like NeMo Curator, and tools for post-training the models on your own data. This end-to-end workflow, supported by platforms like DGX Cloud, significantly speeds up development.

Developers can leverage Cosmos for crucial tasks like generating synthetic data to train perception systems or initializing and evaluating policy models that guide AI behavior. It's making the complex challenge of teaching AI about the physical world much more accessible.

What's exciting is NVIDIA's commitment to openness; the core models, guardrails, and tools are available under an Open Model License on platforms like Hugging Face and GitHub. This opens up powerful physical AI development to a wider community.

Object Storage Calculator

Navigating cloud storage costs can feel like a maze, but this object storage pricing calculator reveals some truly surprising findings. Forget everything you thought you knew about paying for traffic.

While giants like AWS and Azure charge for every gigabyte of data transferred out, several providers offer entirely free traffic, drastically cutting costs for data-heavy applications.

Beyond traffic, storage prices vary wildly, from fractions of a cent to several cents per gigabyte, often with minimum monthly fees that can make storing small amounts surprisingly expensive.

Understanding the provider's billing unit whether they charge per GB or round up to the nearest TB and accounting for included allowances is critical for predicting your actual bill.

This tool cuts through the complexity, showing how factors like free traffic, minimums, and billing models create massive cost differences across providers like Cloudflare R2, Hetzner, Backblaze, and Wasabi.

AI Audio in Minutes

Developers building with Next.js can now easily integrate powerful audio features thanks to this open-source Next.js Audio Starter Kit.

This kit, introduced by ElevenLabs, lets you quickly add capabilities like Text-to-Speech (TTS), Speech-to-Text (STT), sound effects, and even conversational AI directly into your product.

It's built using the ElevenLabs SDK, Next.js, shadcn/ui components, and Tailwind CSS v4, providing a modern and robust starting point.

The key benefit is speed you can add these complex audio functionalities in minutes, significantly accelerating development time for applications needing voice interaction or sound elements.

Cloud Based Code Agents

Get ready for a major shift in how you code: dev agents with full context are now available, living right in the cloud but working seamlessly within your VS Code environment. This isn't just another AI helper; these are full-fledged Remote Agents.

These cloud-based agents boast unprecedented full-codebase context, meaning they understand your entire project structure and logic instantly. Paired with deep IDE integration and full access to your toolchain, they can perform complex tasks right where you work.

Millisecond Web Audio Sync

Achieving perfectly synchronized audio playback across multiple devices in a web browser seems impossible, but synchronize web audio across multiple devices with millisecond accuracy. This open-source project, Beatsync, tackles the challenge head-on.

It uses techniques inspired by the Network Time Protocol (NTP) to keep playback precisely aligned, overcoming typical web limitations like variable network latency and browser clock drift.

Beyond core synchronization, Beatsync includes features like spatial audio controls for unique effects and a polished user interface with helpful status indicators.

Built as a Turborepo monorepo, it uses a Bun server, a Next.js client with Tailwind and Shadcn/ui, and is designed to be self-hostable with a simple quickstart guide.

While still in early development with experimental mobile support, Beatsync offers a glimpse into the future of synchronized web experiences and welcomes contributions to help refine its high-precision capabilities.

Bruno Is Offline Only

Bruno is revolutionizing API testing by understanding why Bruno stays offline. Unlike cloud-heavy tools, it keeps your API collections right on your filesystem.

It uses a simple plain text format called Bru, making it incredibly easy to manage your collections with Git or any version control you prefer.

This open-source IDE offers a lightweight yet powerful alternative to tools like Postman and Insomnia, focusing on data privacy and user control.

With features like cross-platform compatibility and a clear roadmap, Bruno is building a dedicated community.

NLP Dates without AI

Discover Chrono's natural language parsing capabilities with this powerful Javascript library. It's designed to effortlessly extract date and time information from plain text, handling everything from casual phrases like "Tomorrow" to formal formats and date ranges.

The library recently underwent a significant V2 rewrite in TypeScript, bringing improved structure and performance. A key change is focusing the default configuration on international English for better predictability, while still supporting multiple locales.

Chrono makes integrating date parsing into your applications much simpler by understanding inputs like "Last Friday", "5 days ago", "Sep 12-13", and standard ISO timestamps.

Under the hood, Chrono uses a pipeline of parsers and refiners to process text and identify date components. This flexible architecture also allows advanced users to customize parsing rules and results for specific needs or languages.

The output provides structured results detailing the parsed text, its location in the input, and ParsedComponents for start and end dates. You can easily access known or implied date parts and generate standard Javascript Date objects.

Task Manager for LLMs

Task Master brings AI task management directly into your code editor, transforming your development workflow with AI assistance. Imagine having an AI that not only understands your tasks but actively helps you manage and implement them within your familiar coding environment.

This open-source system integrates seamlessly with popular AI-powered editors like Cursor, Lovable, Windsurf, and Roo, leveraging models such as Claude, OpenAI, and others via MCP (Model Control Protocol) or a command-line interface. It can parse product requirements (PRDs), intelligently determine the next task to focus on, and even assist in generating the necessary code files.

It's a genuinely exciting step towards a future where AI isn't just a coding copilot but an integrated project manager, making task tracking and implementation significantly more efficient right where you code.

Instant Type-Safe Stacks

Tired of wrestling with complex setups for type-safe projects? A new CLI tool is here to build modern type-safe stacks instantly. It aims to simplify the process, bringing best practices and customizable configurations right to your terminal.

This isn't just another starter template; it provides end-to-end type safety from your database all the way to the frontend using tRPC. Imagine catching data errors before you even run your code that's the power of full type-safety.

The stack is decidedly modern, featuring popular choices like React, Hono or Elysia for the backend, and TanStack libraries for data fetching and routing. You also get flexibility with database options like SQLite (via Turso) or PostgreSQL, and ORM choices between Drizzle and Prisma.

It goes beyond the basics, offering integrated authentication with Better-Auth, optional support for Progressive Web Apps (PWAs) and desktop apps via Tauri, and a monorepo structure powered by Turborepo. It's designed to give you a robust foundation tailored to your needs.

Getting started is incredibly simple. Just run npx create-better-t-stack@latest (or use bun/pnpm), answer a few prompts, and you'll have a production-ready codebase in minutes. It truly delivers on its promise of a zero-config setup.

For deeper dives into features, guides, and examples, check out the official documentation. This tool looks set to become a go-to for developers prioritizing type safety and modern development practices.

Databases without Docker

Managing databases locally often means wrestling with Docker containers or virtual machines, but now you can manage databases without Docker using DBngin.

This tool simplifies getting started with popular databases like PostgreSQL, MySQL, MariaDB, and Redis, right on your Mac or Windows machine.

The truly surprising part is that it runs database servers natively. No bulky dependencies, no complex Docker setups, just click and go. It's like having dedicated database servers installed instantly, but all managed from one simple app.

DBngin makes it easy to quickly set up and switch between multiple versions and ports for different projects, and surprisingly, it's completely free, developed by the team behind TablePlus.

Dialogue in One Pass

Nari Labs has released this ultra-realistic dialogue model, Dia, a 1.6B parameter text-to-speech system designed for generating natural conversations. Its key innovation is generating highly realistic dialogue in a single pass.

This model goes beyond simple text reading. It can be conditioned on audio to control emotion and tone, and it produces non-verbal sounds like laughs or coughs using simple tags.

The pretrained model and inference code are open-source and available on Hugging Face. You can explore a demo comparing its output to other models or try it directly on their ZeroGPU Space.

Currently supporting only English, the model is hardware-accelerated and optimized for GPUs, requiring around 10GB of VRAM for efficient inference. Installation is available via pip or a Gradio UI.

It's vital to adhere to the strict usage disclaimer: the model must not be used for identity misuse, deceptive content, or any illegal or malicious activities. Ethical use is emphasized.

Future work includes optimizing speed, adding quantization for memory efficiency, and expanding platform support.

Debug LLM Agents

Bringing complex LLM agents to production reliably feels impossible, but Langfuse makes debugging LLM agents much more manageable. Building apps with chained LLM calls and tools introduces unique headaches compared to traditional software engineering.

When an agent goes off-track or gives a bad answer, pinpointing why in a multi-step process is incredibly difficult. LLMs are non-deterministic, making consistent quality assessment at scale a huge challenge.

Langfuse tackles this head-on with integrated tools covering the whole development lifecycle. Its observability features let you trace every step of an agent's execution, seeing inputs, outputs, and tool use in one view.

You can manage and version prompts centrally, run evaluations (like LLM-as-a-judge) on traces, and build datasets from real-world interactions to benchmark changes. It's like having a full dev environment specifically for your AI logic.

It's open-source, works with major frameworks (LangChain, LlamaIndex, etc.) and models, designed for production scale, and you can adopt features incrementally. This platform genuinely changes how you approach building robust LLM applications.

Video Diffusion on Laptops

Imagine generating long, high-quality videos even on a laptop GPU with only 6GB of memory. That's now possible with FramePack, a revolutionary next-frame prediction model that makes video diffusion feel as accessible as image diffusion.

The core innovation lies in packing input frame contexts to a constant length. This technical feat ensures the generation workload remains the same regardless of how long the video becomes.

This means FramePack can process thousands of frames on hardware previously considered insufficient for lengthy video tasks, as demonstrated in lllyasviel's FramePack GitHub repository. You get progressive generation with visual feedback as each section is created.

Installation is designed for desktop use, offering a simple one-click package for Windows and standard pip install for Linux. It's built as functional desktop software.

The recent FramePack-F1 update introduces forward-only sampling and a new anti-drifting regulation, pushing the capabilities even further. FramePack is making practical video generation accessible to a much wider audience.

Git with Undo

Ever wished your version control system had a built-in undo button? Meet this new version control system, Jujutsu (or jj), designed from the ground up to be both simple and powerful, while remaining compatible with your favorite Git tools.

Unlike Git, jj treats your working copy itself as a commit that's automatically updated. This means no more "dirty working tree" errors or needing git stash your current changes are always safely recorded.

Every single operation, from commits to pushes, is logged. This operation log allows you to easily step back in time or undo specific mistakes, finally bringing robust undo history to version control.

Conflicts aren't just messy text diffs; they're first-class objects that get recorded directly in commits. This enables features like automatic rebase that even propagates conflict resolutions downstream, simplifying complex patch-based workflows.

Despite its unique internal model, jj uses Git repositories as its storage backend, making it fully interoperable with existing Git remotes and workflows. You can fetch, push, and even use jj and git commands side-by-side in the same repo.

API Says No

Ever been stuck needing the perfect excuse or just a funny way to decline? Now there's this service just for saying no, a wonderfully simple API that delivers a random, creative rejection reason on demand.

Just hit the /no endpoint with a GET request, and you'll get a JSON object containing a unique "reason". It's got a generous rate limit of 120 requests per minute per IP, so you won't run out of ways to politely (or hilariously) say no.

From simulating rejection in testing to powering a witty bot or finding a quick personal excuse, this API is surprisingly versatile. The README mentions over 1000 reasons, covering everything from generic to downright bizarre!

It's built on Node.js with Express, is easily self-hostable, and even has a .devcontainer ready to go for quick development setup. It's a testament to building simple, fun tools.

React State In URL

Imagine managing your React component state not just in memory, but directly in the browser's URL query string. Meet nuqs, the React library that makes this wild idea a reality, offering a type-safe way to sync your state with search parameters across various frameworks like Next.js, Remix, and plain React SPAs.

It functions much like React's useState hook, providing a value and an updater function, but the source of truth becomes the URL itself. This means state persists on page reloads and is easily shareable via the URL.

The library includes built-in parsers for various types like integers, booleans, dates, arrays, and JSON, ensuring your state is not just a string. You can also define custom parsers for complex types.

For Next.js users, it offers features like shallow updates to avoid full page reloads for client-side state, support for useTransition for server updates, and a cache for type-safe access in server components. Batching updates and managing browser history (push/replace) are also handled.

This approach simplifies state management for things like filters, pagination, or form data, making your application state visible, bookmarkable, and shareable directly from the URL. It's a fresh perspective on state persistence in React applications.

Code Inside Spreadsheet

Forget everything you thought you knew about spreadsheets; this innovative spreadsheet platform from QuadraticHQ is changing the game. It brings powerful computational capabilities directly into a familiar grid interface.

It's not just cells and formulas. Quadratic integrates coding environments (like Python and Rust), AI capabilities, and direct data connections right into the spreadsheet interface, merging the flexibility of code with the accessibility of a spreadsheet.

Under the hood, it leverages high-performance languages like Rust and TypeScript, likely utilizing WebAssembly (WASM) to run code kernels efficiently and securely within the browser environment.

This means you can perform complex data analysis, build custom logic, and connect to external data sources without ever leaving your spreadsheet, enabling powerful ETL and data science workflows for everyone.

It's a compelling vision for the future of data work, making advanced tools more collaborative and accessible to a broader audience.

Free Cursor Pro for Students

Verified university students can now get a full year of Cursor Pro features completely free. This code editor, already favored by students at top universities worldwide, is making its advanced AI capabilities accessible to the next generation of developers.

Students highlight how Cursor dramatically speeds up their workflow, helping them quickly map out approaches, generate boilerplate code, and debug complex issues. It allows them to focus on the more enjoyable or critical parts of projects, like performance tuning or design, instead of getting bogged down in tedious tasks.

Whether tackling research, building startups, or mastering new languages and frameworks, users report significant productivity boosts and a reduced learning curve. It acts like an "AI research assistant" or a tool that helps turn complex ideas into working applications faster than ever thought possible.

The Secret Knowledge Book

This incredible collection of knowledge the book of secret knowledge is a GitHub repository packed with inspiring lists, manuals, cheatsheets, blogs, hacks, and tools. It's like a hidden treasure trove for anyone working in technical fields, especially those focused on systems, networks, and security.

The repository is primarily aimed at System and Network administrators, DevOps, Pentesters, and Security Researchers. However, the author notes that "everyone, really" can find something valuable inside.

You'll find resources covering everything from essential CLI tools like strace and tcpdump to advanced topics like penetration testing techniques and containers. There are sections dedicated to web tools, systems, networks, and even inspiring lists and blogs.

It's constantly updated, ensuring the information remains relevant. Whether you need a quick shell one-liner, a guide to hardening your system, or want to explore hacking labs, this repo has you covered. It's a truly invaluable resource for deepening your technical understanding.

Native Maps in Expo

Get ready for a game-changer in mobile mapping! Expo Maps is a brand new native module bringing high-performance, platform-specific map experiences directly into your Expo apps. This isn't just a wrapper; it leverages native iOS and Android features for smooth gestures like 3D views and fluid zooming.

Setting it up involves configuring API keys for Google Maps on Android, including obtaining your app's SHA-1 fingerprint for security. You'll add the expo-maps plugin and specify necessary permissions in your app.json. While iOS works out of the box with Apple Maps, Android requires this crucial step for Google Maps integration.

Controlling the map camera is straightforward, allowing you to set initial positions and zoom levels. What's really impressive is using a ref to programmatically change the camera position with beautiful, smooth transitions, making navigation between points a seamless experience for the user.

Adding visual elements like markers, annotations (iOS only, allowing custom images), and polylines for drawing routes is fully supported. The API handles platform differences gracefully, offering distinct properties like system images and tint colors on iOS markers versus draggable pins on Android. You can easily customize the look and feel of these elements.

Beyond visuals, you have extensive control over map properties like map type (standard, hybrid, satellite, terrain), enabling traffic layers, and controlling user interactions like selecting points of interest or preventing selection. The map also exposes a rich set of events, letting you react to user clicks, long presses, marker taps, and camera movements, providing granular control over the map's behavior.

Boomerang Tasks

Ever wondered how an AI could handle really complex projects? Roo Code introduces "Boomerang Tasks," a powerful feature that acts like a project management superpower within the system.

These tasks let you break down large, multi-step objectives into smaller, focused subtasks. It's like having an AI that can delegate parts of its own work to specialized assistants.

The real innovation is delegating each subtask to the specific AI mode or model best equipped for that job. Need code written? Send it to code mode. Need architecture planned? Delegate to architect mode.

When a subtask is completed, it "boomerangs" back to the main process with a summary of its work. This keeps the overall workflow incredibly clean.

All the detailed back-and-forth interactions stay contained within the boomerang task itself, preventing clutter in your main view. This demo highlights how Roo Code orchestrates these tasks, even switching to a browser mode to verify work.

This is currently a custom feature in Roo Code, showcasing a fascinating direction for AI workflow automation and task management.

Load Tests

Artillery runs load tests in your cloud, offering a fundamentally different approach to platform-based performance testing. Instead of relying on a hosted service with potentially high costs and infrastructure management, Artillery orchestrates tests that execute directly within your AWS or Azure account.

This serverless model means you only pay for the actual compute time used during tests, which is presented as significantly more cost-efficient than traditional hosted solutions. It also enhances security and governance by keeping your testing traffic and data within your own cloud environment.

The platform is designed to test a wide array of services, supporting everything from standard HTTP APIs, GraphQL, and WebSockets to gRPC, Kafka, and even full browser flows using Playwright for realistic user simulation and Core Web Vitals analysis. You get integrated reporting and collaboration tools to analyze results.

Artillery also boasts extensive integrations with popular CI/CD systems like GitHub Actions, GitLab, and Jenkins, as well as observability platforms including OpenTelemetry, Datadog, and Lightstep. This allows for continuous load testing as part of your standard development workflow.

Ultimately, Artillery aims to solve the common pain points of load testing complexity, cost, and infrastructure overhead by providing a modern, scalable, and easy-to-use platform that makes production-grade performance testing accessible to all teams.

AI Gets Browser Eyes

What if your AI code assistant could actually see the UI you're working on? This innovative browser toolbar called stagewise gives your AI agents real-time context directly from the browser, revolutionizing frontend development workflows.

It's like giving your AI "eyesight" for the web. You can select specific elements in your running web app, leave comments on them, and stagewise sends all the relevant details including DOM structure, screenshots, and metadata straight to your code editor.

This eliminates the tedious process of manually describing UI problems or pasting file paths into AI prompts. Your AI agent gets immediate, accurate context, allowing it to understand the visual and structural aspects of the element you're focused on.

Stagewise works out of the box but is also highly customizable via a configuration file. It connects to your MCP server and is designed not to impact your application's production bundle size.

It offers dedicated integration packages for popular frameworks like React, Next.js, Vue, and SvelteKit, making setup straightforward. Examples show how to initialize it for seamless integration into your development environment.

While currently offering full support for Cursor, development is underway to expand compatibility to other popular AI agents like GitHub Copilot. This aims to make the browser-to-editor context available across different AI tools.

That’s all for this month!
Jay

Update Night

Discussion about this post

Ready for more?