US Equity Markets Training Data

Russell 1000 and 2000 Companies
A data snapshot of more than 2000 US public companies' financials, stats and calculations for data science training purposes

  • Fundamental analysis
  • Profitability analysis
  • Company segmentation and benchmarking
  • Performance analysis with decision trees


This data product contains a snapshot data of more than 2,000 United States public companies on NYSE and NASDAQ exchanges.

The dataset is designed for data science training purposes with rich attributes, variables and calculations, so data scientists can test different hypothesis and algorithms with their favourite tools.


There are almost 3,000 public companies in this dataset.

Sector Number of Companies
Finance 520
Consumer Services 514
Health Care 466
Technology 359
Capital Goods 238
Basic Industries 160
Consumer Non-Durables 140
Energy 128
Public Utilities 108
Consumer Durables 85
Miscellaneous 84
Transportation 57


There are 35 variables that are from financials, price stats, scores and curated calculations. You can check out the dictionary to learn the variables.

Data Collection Methodology

  • Data is cleansed and organized to provide a ready for analysis dataset
  • Variables are chosen from key financial items and stats
  • There are curated scores and calculations so you can train your models with rich variety
  • This dataset is created for ML training purposes and not updating. If you need daily fresh and quality checked data, you should check out the related products below.

Key Features

  • Rich content for analysis and ML training
  • Updated once a year
  • Covers Russell 3000 Companies
  • Covers S&P 500


49 Data Columns

Security ID (security_id)

An identifier which is the concatenation of the ticker and the security name

Ticker (_ticker)

The one to four or five character identifier for each security

Security Name (_security)

Name of the security

Ticker and Security name (_ticker_security)

An identifier which is the concatenation of the ticker and the security name

State (state)

If country of the issuer is United States then the name of the state, otherwise N/A

Exchange (exchange)

Security exchange name

Sector (sector)

Sector information of the company in NASDAQ

Industry (industry)

Industry information of the company in NASDAQ

Is Russell 1000 (is_russell1000)

T(rue) if the ticker is listed in Russell 1000 index, otherwise F(alse)

Is Russell 2000 (is_russell2000)

T(rue) if the ticker is listed in Russell 2000 index, otherwise F(alse)

Is S&P 500 (is_sp500)

T(rue) if the ticker is listed in S&P 500 index, otherwise F(alse)

Is NASDAQ 100 (is_nasdaq100)

T(rue) if the ticker is listed in NASDAQ 100 index, otherwise F(alse)

Is Dow Industrial (is_dowindustrial)

T(rue) if the ticker is listed in Dow Industrial index, otherwise F(alse)

Total Assets (_bs_total_assets)

Total Assets is the sum total of all gross investments, cash and equivalents, receivables, and other tangible and intangible assets as they are presented on the balance sheet

Total Debt (_bs_total_debt)

Total of Debt reported on the Company's Balance sheet including, short term debt, short term portion of long term debt and long term debt

Day Close (_close)

The price per share for the last trade on the quote_date during regular market hours

Enterprise Value (_enterprise_value)

Enterprise Value (EV) represents a more comprehensive valuation of the company compared with Market Cap of the company. Market Cap represents only Equity capitalization, whereas EV also includes debt capitalization and cash on hand. A buyer of this enterprise will have to pay the Market Cap to the shareholders, the total debt to the debt holders, but can use the companies cash to make some of these payments.

Net Income (_inc_net_income)

Net income is equal to net earnings (profit/loss) calculated as Gross Profit less total operating expenses, depreciation, interest, taxes and other income expense from continuing and discontinued operations.

Operating Income (_inc_operating_income)

Operating income is an accounting figure that measures the amount of profit realized from a business's operations, after deducting operating expenses such as wages, depreciation, and cost of goods sold. Operating income is the Gross Profit less Total Operating Expenses

Total Revenue (_inc_revenue)

Total Revenue is the total billings generated from normal business operations net of discounts and deductions for returned merchandise

Market Capitalization (_market_cap)

Market capitalization refers to the total dollar market value of a company's outstanding shares. It is calculated by multiplying a company's shares outstanding by the current market price of one share. The investment community uses this figure to determine a company's size, as opposed to using sales or total asset figures.

Operating Margin (_operating_margin)

A measure of a business's operating profitability expressed as a percentage. It measures the proportion of revenue available to a company after paying those operating expenses that are most closely associated with the products and services sold have been deducted.

PE (Price to Earnings) Ratio (_pe_ratio)

The price-to-earnings ratio (P/E ratio) is the ratio for valuing a company that measures its current share price relative to its per-share earnings (EPS). The price-to-earnings ratio is also sometimes known as the price multiple or the earnings multiple.

Volume (_volume)

The number of shares that exchange hands during a quote_date.  Volume may be quoted over longer periods of time such as a week.

Fundamental Score (ab_fundamental_score)

Fundamental score is an alternative index for the given security calculated by Alta Bering

Levarage (ab_leverage)

Leverage is the ratio of security's total debt by the total assets

Price Deviation (ab_price_deviation)

Ratio of close price to 50 day moving average price

50 Day Average Price (avg_price_50d)

Moving Average of close price over the past 50 trading days

Beta (3y) (beta_3y)

Beta measures the relationship between the price movements of the security and those of the market. It's the coefficient of the regression of the security's returns on the market's returns. Positive beta shows security's likewise movement with the overall market and negative beta vice versa.

Beta whose absolute value is greater than 1 shows implies that the security is more volatile than the overall market. Similarly, a beta between -1 and 1 implies a less volatile security than the market. Beta can be calculated as the ratio of covariance of security returns and market returns to the variance of market returns. This equals the sum of product of market and security returns over the sum square of market returns.

Cash and Cash Equivalents (bs_cash_and_cash_equivalents)

Cash and cash equivalents refer to the line item on the balance sheet that reports the value of a company's assets that are cash or can be converted into cash immediately. These include bank accounts, marketable securities, commercial paper, Treasury bills and short-term government bonds with a maturity date of three months or less. Marketable securities and money market holdings are considered cash equivalents because they are liquid and not subject to material fluctuations in value.

Total Liabilities (bs_total_liabilities)

Total liabilities are the combined debts and obligations that a company owes to outside parties. All assets of a company are either owned by the entity and classified as equity or are subject to future obligations which are recorded as a liability. On the balance sheet, total liabilities plus equity must equal total assets.

Dividend Rate per Share (dividend_rate)

Dollar Dividends paid per share in the trailing twelve months

EPS (Earnings per Share) (eps)

Earnings per share (EPS) is calculated as a company's profit divided by the outstanding shares of its common stock. The resulting number serves as an indicator of a company's profitability. It is common for a company to report EPS that is adjusted for extraordinary items and potential share dilution. The higher a company's EPS, the more profitable it is considered.

Day High (high)

Day high is the highest price at which a stock trades over the course of the quote_date.  High is higher than or equal to all other price quotes for a given quote_date,

52-Weeks High Price (high_price_52w)

A 52-week high is the highest price at which a stock has traded during the previous year

Research & Development (inc_research_and_development)

Research and development (R&D) expense refers to the cost of activities a company undertakes to innovate and introduce new products and services. It is often the first stage in the development process of a new product.

Selling, General & Administrative (inc_selling_general_and_administrative_expense)

Selling, general and administrative expense (SG&A) is the sum of all direct and indirect selling expenses and all general and administrative expenses of a company. The SG&A is comprised of all operating expenses of a business that are not included in the cost of goods sold such as corporate expenses, facility costs and marketing expenses.

Total Operating Expenses (inc_total_operating_expenses)

An operating expense is an expense a business incurs through its normal business operations. Often abbreviated as OPEX, operating expenses include rent, equipment, inventory costs, marketing, payroll, insurance, step costs, and funds allocated for research and development. It does not include COGS.

Day Low (low)

Daily low is the lowest price at which a stock trades over the course of the quote_date.  Low is lower than or equal to all other price quotes for a given quote_date,

52-Weeks Low Price (low_price_52w)

A 52-week low is the lowest price at which a stock has traded during the previous year

Money Flow Index (MFI) (mfi)

MFI is an index between 0 and 100 that measures market participants' enthusiasm about a security, whether it is overbought or oversold. Its calculation is the index transformation of the volume weighted ratio of total dollar trade in up days to total dollar trade (both up and down days) over the past 14 trading days.

Typically MFI < 30 implies that the security is oversold security and MFI > 70 implies that the security is overbought security. Just like RSI MFI should be used with care when trading, and confirmed with other indicators.

Day Open (open)

Open price at which a security first trades upon opening of the exchange on a quote_date

4 Weeks Change (prev_change_4w)

4 weeks price change of the security

8 Weeks Change (prev_change_8w)

8 weeks price change of the security

12 Weeks Change (prev_change_12w)

12 weeks price change of the security

Standard Deviation of 50 Days (price_stddev_50d)

Standard Deviation of secuity's last 50 days close price

Relative Strength Index (RSI) (rsi)

RSI (Relative Strength Index) is an index between 0 and 100 that measures market's view about a security relative to market, whether it is overbought or oversold. Its calculation is the index transformation of the ratio of average appreciation to average depreciation of the security.

RSI under 30 implies oversold security and above 70 overbought security. Oversold loosely means that everybody who would sell the security has already sold it so the security should start appreciating. Overbought loosely means that everybody who would buy the security has already bought it so the security should start depreciating. RSI is a fast leading indicator that can give false signals so it should be used with care.

Shares Outstanding (shares_outstanding)

Shares outstanding refer to a company's stock currently held by all its shareholders, including share blocks held by institutional investors and restricted shares owned by the company's officers and insiders.

Update Time (update_time)

Update time is based on when Alta Data has updated the data point from the data source

Data Provider

Alta Bering

Alta Bering is a Data Curation and Business Analytics Company with its roots in management consulting and decision science. Alta Bering is based in British Columbia and also offers a visual business analytics platform called EPO to help solve complex business problems by using popular Machine Learning and statistical methods

