LLM Sanity
Posts
Sanity Check - Week of April 22, 2024

Sanity Check - Week of April 22, 2024

Andrij David
April 28, 2024

Welcome to this week's edition of LLMSanity, your one-stop shop for all things Large Language Model research! We know that keeping up with the latest developments in LLMs can be a little overwhelming, which is why we're here to help you stay sane. In this issue, we've got a platter of papers that are sure to satisfy your appetite for knowledge. From efficient serving of LLMs through FP6-centric algorithm-system co-design to achieving state-of-the-art performance on complex reasoning tasks, we've got you covered. Next, we've got the Business Meltdown, where we dish out the latest news and updates from the world of LLMs in business. So sit back, relax, and let us serve up some sanity-saving LLM goodness!

The Paper Platter

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design - FP6 quantization is a practical alternative to further democratize the deployment of LLMs without significantly sacrificing model quality on complex tasks and various model sizes. This paper proposes TC-FPx, the first full-stack GPU kernel design scheme with unified Tensor Core support of float-point weights for various quantization bit-width. This approach breaks the limitations of the underlying GPU hardware, allowing the GPU to support linear layer calculations on model weights of arbitrary bit width. Currently, the TC-FPx kernel only supports NVIDIA Ampere GPUs and is only tested and verified on A100 GPUs.
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone - Microsoft just launched Phi-3, a family of small open-sourced LLMs. Phi-3-mini is a highly capable language model that can be deployed on a phone, despite having 3.8 billion parameters! This tiny but mighty model was trained on a scaled-up version of the dataset used for phi-2, composed of heavily filtered web data and synthetic data. The model's performance rivals that of much larger models like Mixtral 8x7B and GPT-3.5 on academic benchmarks and internal testing. Plus, it's been further aligned for robustness, safety, and chat format. So, if you're looking for a powerful language model that won't break your phone (or the internet), Phi-3-mini might just be the one for you!
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework - Apple quietly released OpenELM. It is an efficient language model family with an open-source training and inference framework. It was developed and released by Apple, and outperforms comparable-sized existing LLMs pretrained on publicly available datasets. The model uses a layer-wise scaling strategy to allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring 2× fewer pre-training tokens. The release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. The code to convert models to MLX library for inference and fine-tuning on Apple devices is also provided. The source code, pre-trained model weights, and training recipes are available on GitHub and HuggingFace.
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models - This paper proposes Graph-of-Thought (GoT) reasoning, a novel approach to modeling human thought processes not only as a chain but also as a graph. GoT captures the non-sequential nature of human thinking and allows for a more realistic modeling of thought processes. The paper presents a two-stage framework for GoT reasoning, which first generates rationales and then produces the final answer. The framework employs an additional graph-of-thoughts encoder for GoT representation learning and fuses the GoT representation with the original input representation through a gated fusion mechanism. The proposed approach is evaluated on a text-only reasoning task (GSM8K) and a multimodal reasoning task (ScienceQA), and the results show significant improvement over the strong CoT baseline and the state-of-the-art Multimodal-CoT.
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Perfect Reasoners - This paper proposes a new prompt strategy called Deeply Understanding the Problems (DUP) prompting to enhance the comprehensive understanding of problems by Large Language Models (LLMs) in complex reasoning tasks. The DUP prompting strategy consists of three stages: extracting the core question, finding problem-solving information based on the core question, and generating and extracting answers by LLMs. The paper evaluates the performance of DUP prompting on ten diverse reasoning datasets and shows that it significantly outperforms Zero-Shot Chain-of-Thought (CoT) prompting across all datasets. Notably, DUP prompting achieves state-of-the-art performance on SVAMP and GSM8K datasets.

The Business Meltdown

Tau Robotics came out of stealth last week with a new demo featuring autonomous robot arms that learn in the real world. The system is fully autonomous and controlled with a single neural network, which is very impressive. The cost of the hardware is around $1400. The company released a demo video, sped up 1.5x, on their Twitter feed. No more information has been provided beyond this tweet so far, and the company's website still appears to be empty. We are eager to have more information on this matter.

Excited to announce Tau Robotics (@taurobots). We are building a general AI for robots. We start by building millions of robot arms that learn in the real world.
In the video, two robot arms are fully autonomous and controlled by a single neural network conditioned on different… twitter.com/i/web/status/1…
— Alexander Koch (@alexkoch_ai)
6:56 PM • Apr 25, 2024

Sanctuary AI, a company on a mission to create human-like intelligence in general purpose robots, just released Gen 7 of its Phoenix humanoid robot. This latest generation of Phoenix features a range of improvements to both hardware and AI software, making it one of the most sophisticated human behavioral data capture technologies available today. The improvements include increased uptime, faster build and commissioning speed, reduced bill of materials, increased range of motion in the wrists, hands, and elbows, further miniaturized hydraulics, improved visual acuity and tactile sensing, and a 50x increase in the speed at which new tasks can be automated. The company's CEO and Co-Founder, Geordie Rose, expressed excitement about the progress made in just 11 months, stating that the system is the most closely analogous to a person of any available.
Introducing: Phoenix™ Gen 7
Major improvements, all delivered in less than a year. Our next generation of general-purpose technology marks an inflection point in task automation velocity.
“With Generation 7, we can capture increasing quantities of higher quality, higher… twitter.com/i/web/status/1…
— Sanctuary AI (@TheSanctuaryAI)
5:00 PM • Apr 25, 2024
Synthesia, an AI startup backed by Nvidia, has introduced a new upgrade for its AI avatars that enables them to convey human emotions and movements. The company's "Expressive Avatars" can express emotions based on text instruction for corporate presentations, marketing, and training purposes. The new avatars were developed by training them on actual humans reading scripts in a studio, which helped bots capture lip tracking and be more accurate in their emotive expressions. The avatars are available in over 130 languages and can provide their own closed captions and even clone users' own voices. Synthesia has at least half of the Fortune 100 companies listed as clients and provides services to over 55,000 enterprises. The UK-based company was founded in 2017 and has reached a valuation of nearly $1 billion. Due to its more narrowed approach of creating humanlike avatars for business use, Synthesia has been sidestepping some of the hype and fierce competition seen between competing chatbot models like OpenAI's ChatGPT and Google's Gemini chatbot. You can try it here https://bit.ly/49WUSDJ

💥 This is it. This changes everything. 💥
For the first time in history, #AIavatars are able to understand what they're saying.
Powered by our new AI avatar model, EXPRESS-1, they can detect sentiment of a script and perform all the subtle nuances of human communication.
The… twitter.com/i/web/status/1…
— Synthesia 🎥 (@synthesiaIO)
4:37 PM • Apr 25, 2024

Adobe announced its Firefly Image 3 model and VideoGigaGAN. The former is a new image generation model that has improvements in quality, stylization, speed, and details. It also comes with improved Photoshop Generative AI features. The latter enables video quality upscaling by 8x for AI-enhanced details. These features came to light after Adobe released multiple generative AI features on its platform.
We're thrilled to announce breakthrough #GenerativeAI features powered by the new #AdobeFirefly video model. Take a sneak peek at Object Addition, Object Removal and Generative Extend. All coming soon to #PremierePro! 💥 adobe.ly/3W0a4Ng
— Adobe (@Adobe)
4:55 PM • Apr 15, 2024
Japanese Robotics firm Reazon showed off teleoperated demos of ReazonCobot. It's the startup's new low-cost and robust household collaborative robot with chest-mounted control, arm elevation, mobile chassis, and lemon-picking abilities.

𝗥𝗲𝗮𝘇𝗼𝗻𝗖𝗼𝗯𝗼𝘁 is a low-cost and robust household collaborative robot.
Check out our first teleoperated dual-armed prototype: chest-mounted control, arm elevation, mobile chassis, and lemons 🍋
(full video → youtu.be/PFw5hwNVhbA)
— Reazon Human Interaction Lab (@REAZON_HI_Lab)
3:10 PM • Apr 24, 2024

Chinese startup Stardust revealed Astribit S1, a new bot that learns through imitation learning It's not a real humanoid (it has wheels), but a cool demo.

MyShell and MIT introduced OpenVoice V2. It's a new text-to-speech model that can clone any voice and speak any language. It is fully open-source and free for commercial use.

Introduce OpenVoice V2 - a Text-to-Speech model that can clone any voice and speak in any language.
Developed by MyShell and @MIT_CSAIL researchers.
🌐 Imagine your voice going global in multiple languages.
🔊 OpenVoice V2 breaks the language barrier and redefines voice… twitter.com/i/web/status/1…
— MyShell (@myshell_ai)
3:51 PM • Apr 24, 2024

The Community Corner

The gap between open-source and proprietary AI models is shrinking, and it's getting pretty exciting! With the release of Llama3, we're seeing some impressive progress. Check out this cool visualization that shows just how much the performance has evolved over time. It's like watching a race to the top!

I used the chatbot arena data from @lmsysorg to create a visualization of LLM’s Elo rating changes. You can see:
1. The gap between various companies/open source projects is narrowing.
2. The major players are gradually becoming the various big tech companies.
— Jianqi Pan (@jannchie)
4:32 PM • Apr 28, 2024

Reply

or to participate.