Predicting Apple’s A.I. Play

iSolutions
17 min readFeb 18, 2024

--

Studying the clues from various releases to predict whats next for Apple in A.I.

Apple’s AI “breadcrumbs” timeline

Amid the flurry of headlines dominated by tech giants making bold strides in artificial intelligence, Apple appears to navigate a quieter path, seemingly absent from AI industry news.

However, a closer examination reveals a trail of breadcrumbs, hinting at the company’s strategic foray into AI that could soon reshape the landscape.

This understated approach, characterized by selective acquisitions, covert project developments, and meticulous integration of AI into its ecosystem, suggests Apple is crafting a future where AI enhances its suite of products in uniquely Apple ways.

By piecing together these clues, we get a glimpse what might be Apple’s AI play.

Acquisitions:

In January of 2020, Apple acquired Xnor.ai , which began as a process for making machine learning algorithms highly efficient — so efficient that they could run on even the lowest tier of hardware out there, things like mobile devices that use only a modicum of power. Yet using Xnor’s algorithms they could accomplish tasks like object recognition, which in other circumstances might require a powerful processor or connection to the cloud.

Later in 2020, Apple acquired Vilynx, who developed a self-learning AI platform for Media to understand content and personalize the user experience on publisher websites, mobile/OTT apps and optimize content creation and distribution. Deep metadata tagging is the base to drive automated previews, recommendations and smart search.

Apple has also recently acquired Datakalab, a Paris-based AI startup known for its expertise in data compression and image analysis. Finalized in December 2023 but only disclosed recently, this acquisition highlights Apple’s aggressive strategy in enhancing efficient, low-power AI algorithms that can operate independently of cloud services.

Remarkably, Apple acquired as many as 32 AI startups in 2023 alone, topping the acquisition lists of tech companies globally.

What this could mean: The integration of technologies from Xnor.ai, Vilynx, and Datakalab could lead to a transformative upgrade in Siri’s capabilities, shifting towards enhanced edge computing. This would enable Siri to process data locally on devices, potentially transforming user interactions by making Siri a more integral and trusted part of the Apple ecosystem. Such advancements would not only enhance the responsiveness and capability of Siri but also underscore Apple’s commitment to privacy and security, reinforcing its competitive edge in the tech industry.

AJAX:

In July of 2023, rumors began about “AJAX”, Apple’s internal code name for “Apple GPT” based on Apple’s proprietary language model framework, Ajax.

The development of Apple GPT is part of Apple’s broader efforts in the field of artificial intelligence. The company has been relatively quiet about these efforts compared to other tech giants.

The internal name for Apple’s language model framework is “Ajax”. The origin of this name is unclear, but it could be a combination of Google’s “JAX” and the “A” from Apple.

The Apple GPT project is still under development. However, it’s reported that Apple is planning to make a major AI-related announcement in 2024, which could be the general release of Apple GPT.

Since, this author has confirmed the existence of AJAX internally within Apple.

What this could mean: The rumored development of AJAX or “Apple GPT” could herald a new era of AI integration within Apple’s ecosystem, potentially integrating it with Safari and Spotlight for smarter, more context-aware search results, AI-driven health and fitness advice, advanced educational tools, creative and content generation applications, to new forms of interactive entertainment. Apple’s focus on privacy and the user experience could set AJAX apart in the crowded field of AI and language models.

Ferrett:

In October 2023, Apple released Ferret, an open-source, multimodal Large Language Model (LLM) developed in collaboration with Cornell University, which represents a significant pivot from its traditionally secretive approach towards a more open stance in the AI domain.

Ferret distinguishes itself by integrating language understanding with image analysis, enabling it to not only comprehend text but also analyze specific regions within images to identify elements and use these in queries. This capability allows for more nuanced interactions, where Ferret can provide contextual responses based on both text and visual inputs​.

What this could mean: The introduction of Ferret into Apple’s ecosystem could potentially revolutionize how users interact with Apple devices, offering enhanced image-based interactions and augmented user assistance. For instance, Siri could leverage Ferret’s capabilities to understand queries about images or perform actions based on visual content, significantly improving the user experience by providing more accurate and context-aware responses. Ferret could enrich media and content understanding, improving the organization, search functionality within Apple’s Photos app, and even offering more personalized content recommendations across Apple’s services​

MLX:

In December 2023, Apple released MLX and MLX Data signifies a pivotal shift towards empowering developers to create more sophisticated AI applications that are optimized for Apple Silicon.

The MLX framework, inspired by PyTorch, Jax, and ArrayFire but with the unique feature of shared memory, simplifies the process for developers to build models that work seamlessly across CPUs and GPUs without the need to transfer data.

MLX is a NumPy-like array framework designed for efficient and flexible machine learning on Apple silicon, brought to you by Apple machine learning research.

The Python API closely follows NumPy with a few exceptions. MLX also has a fully featured C++ API which closely follows the Python API.

What this could mean: This could lead to a new era of generative AI apps on MacBooks, which may include capabilities similar to Meta’s Llama or Stable Diffusion.

HUGS:

Also in December 2023, Apple Machine Learning released “HUGS: Human Gaussian Splats,” in collaboration with the Max Planck Institute for Intelligent Systems, introduces a novel approach to create animatable human avatars and scenes from monocular videos using 3D Gaussian Splatting.

This method efficiently separates and animates humans within scenes, achieving state-of-the-art rendering quality and speed. It addresses challenges in animating 3D Gaussians, optimizing for realistic movement and enabling novel pose and view synthesis at high speeds, significantly outperforming previous methods in both training time and rendering speed.

https://mlr.cdn-apple.com/video/novel_pose_view_2_79fa4a7a4f.mp4
https://mlr.cdn-apple.com/video/novel_multihuman_scene_1_c5b2b300a.mp4

What this could mean: Apple’s “HUGS: Human Gaussian Splats” could enable more realistic and interactive 3D animations from simple video inputs. This could lead to advancements in augmented reality experiences, improved virtual assistants, and more immersive gaming and social media applications on Apple devices. The technology’s efficiency in rendering and animating could also enhance user experiences across the Apple ecosystem, making digital interactions more lifelike and engaging.

Flash Memory:

In December 2023, Apple released a research paper titled “LLM in a flash: Efficient Large Language Model Inference with Limited Memory,” noted that flash storage is more abundant in mobile devices than the RAM traditionally used for running LLMs.

Their method cleverly bypasses the limitation using two key techniques that minimize data transfer and maximize flash memory throughput:

  1. Windowing: Think of this as a recycling method. Instead of loading new data every time, the AI model reuses some of the data it already processed. This reduces the need for constant memory fetching, making the process faster and smoother.
  2. Row-Column Bundling: This technique is like reading a book in larger chunks instead of one word at a time. By grouping data more efficiently, it can be read faster from the flash memory, speeding up the AI’s ability to understand and generate language.

The combination of these methods allows AI models to run up to twice the size of the iPhone’s available memory, according to the paper. This translates to a 4–5 times increase in speed on standard processors (CPUs) and an impressive 20–25 times faster on graphics processors (GPUs). “This breakthrough is particularly crucial for deploying advanced LLMs in resource-limited environments, thereby expanding their applicability and accessibility,” write the authors.

What this could mean: Apple’s research on running Large Language Models (LLMs) efficiently on mobile devices could herald a new era of on-device AI processing. By optimizing data handling through “windowing” and “row-column bundling,” Apple has potentially developed a method to significantly speed up AI inference on standard CPUs and GPUs. This breakthrough could lead to faster and more powerful AI functionalities directly on iPhones and iPads without relying on cloud computing, enhancing user experience while maintaining Apple’s strong stance on privacy.

Siri Summarizations:

In January 2024, leaks on iOS 18 revealed Siri Summarization functionality that specifically reference OPENAI.

The leaked images show specific functions referencing both AJAX and OpenAI.

Specific reference to “OpenAIGPT”

The mentions of “OpenAISettings” section and references to issues such as “SummarizationOpenAIError” and a missing OpenAI API key.

This suggests that there’s an attempt to make an API call to OpenAI’s service, likely for a feature involving text summarization.

Specific reference to “AJAXGPTonDevice”

What this could mean: This could indicate that Apple is testing or developing a feature using OpenAI’s language models for summarization purposes within its software, as suggested by the “SummarizationOpenAIError.” The code seems to be part of an internal testing process for integrating OpenAI’s language models into Apple’s ecosystem, potentially for improving Siri’s functionalities or other text-based features in iOS.

MGIE:

In February of 2024, Apple released MGIE, representing a significant advancement in instruction-based image editing.

Utilizing multimodal large language models (MLLMs), MGIE interprets natural language commands for precise pixel-level image manipulations, covering a spectrum from Photoshop-style modifications to global photo enhancements and detailed local edits.

Developed in collaboration with the University of California, Santa Barbara, and showcased at ICLR 2024, MGIE underscores Apple’s growing AI research capabilities, offering a practical tool for creative tasks across personal and professional domains.

What this could mean: MGIE’s release could significantly propel Apple’s AI capabilities, especially in creative and personalization applications. This move may lead to more intuitive interfaces for content creation, potentially integrating into existing Apple products to offer advanced editing features directly within the ecosystem. MGIE’s integration into Apple’s software or hardware offerings could revolutionize user interaction with devices, making advanced image editing accessible directly from iPhones, iPads, or Macs. Imagine Siri or Photos app leveraging MGIE for editing commands, enhancing user creativity without complex software

Keyframer:

Also in February 2024, Apple released Keyframer, a generative AI tool for animating 2D images using text descriptions, showcasing the potential of LLMs in animation.

It simplifies the animation process, allowing users to animate SVG images through text prompts without coding knowledge. While promising, Keyframer is in the prototype stage, highlighting the evolving landscape of AI in creative fields and suggesting a future where AI tools could significantly augment creative workflows.

SVG images and text descriptions fed into Keyframer are automatically converted into animation code. Image: Apple

What this could mean: Keyframer could be integrated into Apple’s software ecosystem, enhancing creative tools in applications like Final Cut Pro, iMovie, or even Pages and Keynote, by enabling easy animation creation.

XCode AI:

In February of 2024, reports of an Apple Xcode Code AI Tool emerged. A report from Bloomberg says Apple has expanded internal testing of new generative AI features for its Xcode programming software and plans to release them to third-party developers this year.

Apple also reportedly looked at potential uses for generative AI in consumer-facing products, like automatic playlist creation in Apple Music, slideshows in Keynote, or AI chatbot-like search features for Spotlight search.

What this could mean: Apple’s development of AI coding tools could be transformative for its suite of developer services, particularly Xcode. These tools, potentially rivaling GitHub’s Copilot, would assist developers in writing code more efficiently, likely by suggesting code snippets, auto-completing lines of code, or even generating code from comments. This could greatly speed up the development process, reduce bugs, and make development more accessible to a broader range of skill levels.

iWork.ai:

In Feb of 2024, According to BuyAIDomains, Apple became the owner of the domain very recently, with its company name and business address of Apple Park in the owner records of “iWork.ai”.

What this could mean: This domain offers speculation that its office suite, consisting of Pages, Keynote, and Numbers, could all be getting some big artificial intelligence features soon.

REaLM:

Apple’s research paper, “ReALM: Reference Resolution As Language Modeling,” introduces a groundbreaking method to enhance how AI understands and interacts with visual and conversational contexts. This research is pivotal as it addresses the challenge of AI comprehending ambiguous references within conversations or related to elements visible on a screen, such as buttons or text. Traditional models struggled with these tasks due to their reliance on vast amounts of data and computing power, which are not always feasible in on-device applications.

Apple’s solution, ReALM, revolutionizes this by transforming reference resolution into a language modeling problem, making it more adaptable and less resource-intensive. By employing smaller, fine-tuned language models, ReALM achieves impressive performance gains in resolving references, particularly for on-screen content. This approach not only boosts the AI’s speed and efficiency but also its ability to operate independently of the cloud, thereby enhancing privacy and usability in mobile settings. The integration of this technology could significantly improve user interaction with devices, allowing more intuitive and seamless control through natural language commands.

What this could mean: Apple’s research could set a new standard for AI interactions, making digital assistants like Siri more perceptive and helpful in everyday tasks. This development promises not only to enhance the functionality of current devices but also to drive innovation in new areas, including accessibility and user interface design. Apple’s focus on on-device processing and privacy-first methodologies positions ReALM as a forward-thinking solution that aligns with modern data security and efficiency needs, potentially influencing future AI applications across the tech industry.

OpenELM:

Apple has introduced OpenELM, a new language model characterized by a unique architecture that optimizes parameter allocation across its layers. This model, detailed in the paper “OpenELM: An Efficient Language Model Family with Open Training and Inference Framework,” showcases a significant improvement in accuracy, outperforming other models of similar size. OpenELM utilizes a layer-wise scaling strategy within its transformer model, enhancing its efficiency.

This approach has allowed it to achieve higher accuracy with fewer pre-training tokens, making it a compelling option for resource-efficient AI processing.

Apple’s OpenELM is not just about architectural innovation; it also marks a shift towards transparency in AI research. By releasing the complete framework for training and evaluation, including code and pre-training configurations, Apple supports open research and replicability, a move that can accelerate advancements across the AI community.

OpenELM’s performance is bolstered by its deployment on publicly available datasets, and its codebase is accessible for customization and improvement on platforms like GitHub and HuggingFace. This open-source approach is poised to foster a more collaborative and inclusive environment for AI research and development.

What this could mean: The implications of OpenELM’s development are vast, particularly for Apple’s ecosystem. Integrating OpenELM could enhance the capabilities of Apple’s devices, especially in handling complex AI tasks efficiently on-device without needing to rely heavily on cloud computing. This could lead to faster, more responsive applications, and services that can operate with heightened privacy and security. For users, this means more powerful, seamless interactions with their devices, potentially transforming how they engage with AI-driven features. As Apple continues to integrate these advancements, it could significantly shift the competitive landscape, emphasizing the importance of efficient, scalable, and open AI systems.

CATLip:

Apple’s research paper “CatLIP: CLIP-level Visual Recognition Accuracy with 2.7× Faster Pre-training on Web-scale Image-Text Data” introduces a transformative approach called CatLIP, which accelerates the pre-training of visual models.

By reframing image-text pre-training as a classification task rather than a contrastive learning task, CatLIP eliminates the need for pairwise similarity computations, significantly speeding up the training process without sacrificing performance. This method enables models to achieve similar downstream accuracy to more traditional methods while using less computational resources, demonstrating its potential to handle large-scale data more efficiently.

CatLIP’s methodology is not only about speed but also about making AI training more accessible and feasible on a broader scale. By circumventing the computational burdens of contrastive learning, CatLIP allows for quicker iterations and potentially lower costs, making it suitable for applications where rapid model updates are crucial, such as mobile and edge devices.

Plus, the open-source availability of CatLIP’s training framework encourages further innovation and collaboration in the AI community, supporting a more inclusive development environment.

What this could mean: The integration of CatLIP into Apple’s ecosystem could significantly enhance the AI capabilities of its devices, particularly in improving the efficiency and responsiveness of AI-driven features. For instance, this could lead to better performance of visual recognition tasks on iPhones and iPads, even in data-intensive scenarios. It also aligns with Apple’s privacy-focused strategy by enabling more powerful on-device processing, which reduces dependency on cloud-based computations. This could be a game-changer in how future Apple devices handle AI tasks, making them faster and more efficient while maintaining user privacy.

STEER:

Apple’s innovative approach to improving voice assistant interactions is encapsulated in their new development, STEER (“Semantic Turn Extension-Expansion Recognition”), as described in their latest research paper. STEER is designed to identify and facilitate ‘steering’ — a user’s follow-up commands that aim to modify or clarify previous voice commands.

This technology addresses the frequent interruptions and repetitions in voice interactions, which often degrade user experience. By accurately detecting steering commands, STEER reduces the need for users to restart or rephrase their requests, thus smoothing the flow of interaction.

The paper introduces STEER+, an enhanced version of the model that incorporates Semantic Parse Trees (SPTs) to provide context about the intent and entities in a conversation. This addition significantly improves the system’s ability to understand and process user commands, especially in complex queries involving named entities. The use of SPTs marks a substantial advancement in making voice assistants more responsive and accurate, mirroring more natural human-to-human interactions.

What this could mean: The integration of STEER and STEER+ into Apple’s suite of products could revolutionize how users interact with their devices. For instance, future versions of Siri could become much more adept at handling follow-up questions or commands without requiring the user to repeat the entire context. This capability could extend to other Apple services and devices, enhancing the overall ecosystem’s interactivity and user-friendliness. Moreover, the improvements in conversational AI could lead to broader applications in customer service, accessibility features, and interactive learning environments, setting new industry standards for voice assistant technologies.

WWDC:

Apple’s annual Worldwide Developer Conference typically takes place in June. Apple has not yet announced the dates for the event.

A new rumor claims that Apple’s generative AI technology will be included in Siri not just locally on iPhones, but also integrated into other services being announced at Apple’s 2024 Worldwide Developer Conference.

What this could mean: Get your popcorn ready, clarity on these rumors could be coming in June.

The Prediction:

A Unified Vision for Apple’s AI Future

The convergence of Apple’s various AI advancements suggests a future where its devices and ecosystems are not only more powerful but also more intuitive, responsive, and privacy-focused. Integrating technologies from acquisitions like Xnor.ai, Vilynx, and Datakalab with new developments such as AJAX, Ferret, and CatLIP, Apple is poised to make significant strides in on-device AI processing. This shift will enhance Siri’s capabilities, making it an indispensable, trusted assistant that processes data locally, thereby bolstering user privacy and security.

Transformative On-Device Experiences

With the integration of STEER and STEER+, Siri could handle follow-up commands more efficiently, reducing the need for users to restate their requests. This improvement would extend across Apple’s ecosystem, from iOS and iPadOS to macOS, enhancing user interaction with all Apple devices. Technologies like OpenELM and ReALM will ensure these AI models operate efficiently on-device, minimizing the reliance on cloud computing. This will lead to faster, more secure AI functionalities, enabling features like context-aware search results, AI-driven health advice, advanced educational tools, and creative applications directly on iPhones, iPads, and Macs.

The rumored development of AJAX, or “Apple GPT,” could herald a new era of AI integration within Apple’s ecosystem, potentially enhancing Safari and Spotlight for smarter, more context-aware search results, providing AI-driven health and fitness advice, advanced educational tools, creative content generation applications, and new forms of interactive entertainment. Apple’s focus on privacy and user experience could set AJAX apart in the crowded field of AI and language models.

Enhanced Software and Development Tools

Apple’s AI advancements will also significantly impact its software and developer tools. Integrating Keyframer and MGIE into applications like Final Cut Pro, iMovie, Pages, and Keynote will streamline the creative process, allowing users to generate and edit content with simple commands. In the realm of development, AI-powered tools akin to GitHub’s Copilot could be integrated into Xcode, making coding more efficient and accessible by suggesting code snippets and auto-completing lines of code. Apple’s office suite, consisting of Pages, Keynote, and Numbers, could also see big AI-driven enhancements, making these applications more powerful and user-friendly.

A New Era of Interactivity and Privacy

Looking forward, the integration of these AI technologies will not only enhance individual user experiences but also foster more secure and responsive interactions across Apple’s services. Technologies like Ferret will revolutionize image-based interactions, allowing Siri to understand and act on visual content, thereby improving the functionality of apps like Photos and enhancing media organization and search. In augmented reality and gaming, advancements such as “HUGS: Human Gaussian Splats” will create more realistic and interactive 3D animations, making digital interactions more lifelike and engaging.

The incorporation of MGIE could propel Apple’s AI capabilities, especially in creative and personalization applications, leading to more intuitive interfaces for content creation. This move may offer advanced editing features directly within the ecosystem, allowing users to edit images and videos by simply describing the changes they want.

Broader Implications for the Apple Ecosystem

Apple’s focus on efficient, scalable, and privacy-focused AI will set new standards in the tech industry. By enabling powerful on-device AI processing, Apple will ensure that its devices are not only more responsive and capable but also more secure, maintaining the company’s strong stance on user privacy. This holistic approach to integrating AI across its ecosystem will transform how users interact with technology, making it more seamless, intuitive, and personalized, thereby solidifying Apple’s position as a leader in the AI-driven future.

So…What’s Next Then?

Apple, often perceived as trailing behind in the artificial intelligence (AI) race, seems to be strategically biding its time, gearing up for a substantial leap in the field.

Despite initial impressions of their hesitant approach, recent developments and rumors suggest that Apple is actively enhancing its AI capabilities.

They are reportedly collaborating with industry giants like OpenAI and Google, and bolstering their proprietary AI model, Ajax. This implies a significant shift from a cautious engagement to a more robust involvement in AI, underpinned by substantial research efforts that hint at future innovations.

The core of Apple’s AI advancements appears to be focusing on efficiency and utility, particularly with its virtual assistant, Siri.

By developing smaller, more efficient AI models capable of running directly on devices without internet dependency, Apple is paving the way for a more responsive and capable Siri.

This on-device processing is not just a technical feat but a strategic move to ensure privacy and speed, enhancing user experience significantly. Research efforts like the EELBERT system show Apple’s commitment to creating powerful yet compact AI models that maintain high performance while occupying less space and consuming less power.

Looking ahead, Apple’s AI research is shaping an ecosystem where AI functionalities extend beyond conventional applications. With AI-powered features anticipated to be embedded in various services — from health monitoring to creative tools and potentially transforming Siri into a proactive and almost prescient assistant — the possibilities are expansive.

Apple’s vision for AI seems not only to enhance functionality but also to integrate seamlessly into daily life, ensuring that their devices are not just tools but proactive participants in managing digital and physical environments.

This approach could redefine user interactions with technology, making Apple’s AI advancements not just an upgrade but a transformative experience for its users.

--

--

iSolutions

Multiple award-winning experts in custom applications, machine learning models and artificial intelligence for business.