Home » SQL & PL/SQL » SQL & PL/SQL » Combining multiple regexp_like statements & correct regex syntax
Combining multiple regexp_like statements & correct regex syntax [message #679089] Mon, 03 February 2020 10:29 Go to next message
mgarret40
Messages: 1
Registered: February 2020
Junior Member
/foru/forum/fa/14266/0/Hi all,

I’m working on creating various 0/1 product flags category variables based on multiple substrings within a string field. The string values contained in the category_description field don’t stick to a consistent classification pattern. Notice for each flag variable I’m using two regexp_like statements. The two things I’m trying to accomplish are.

1. For each category flag variable combine the two regexp_like statements into only one regexp_like statement

2. I’m trying to correct some of the issues I’m running to with the regex syntax.

**Swimsuit classifying categories which actually dress suits and not swimsuits or flagging jumpsuits as swimsuits
**categories is being flagged as swimwear when it represents just shorts. Example: Men > Clothing > Shorts & Swimwear > Shorts
Situations where there are no spaces between substrings. Example: where the description Layette/Infant Unisex - Accessories – Sale is not being flagged as baby when it contains the word infant

Below is my current code, sample data, and I've attached an image on the final output I'm looking for.

All help will be greatly appreciated.

Sample data:

CREATE TABLE category_flags
(
category_description VARCHAR(80)
);
INSERT INTO category_flags
(category_description)
VALUES

("Baby > Basics > Accessories > Hats,Scarves, & Gloves")
("Men > Shoes & Accessories > Hats")
("Men > Shoes & Accessories > Ties")
("Women > Clothing > Jackets")
("Women > Shoes & Accessories > Hats")
("Kids > Boys > Jackets")
("Tall/Men/Outerwear/Cloth/Sale")
("Men > Bottoms > Swim")
("Men > Bottoms > Swimwear")
("US/Big_Tall/Men/Chinos/Flat_Front")
("men>clothing>sweatshirts")
("men>clothing>swimwear")
("Children/Girls/Outfit")
("Men > Accessories > Fragrance")
("Black_Top/Men/Dress_Shirts/Slim_Fit/Sale")
("Men > Bottoms > Suits, Sport Coats & Trousers")
("Women > Tops > Sweaters")
("Kids > Accessories > Boy's Accessories > Bags")
("boys>accessories>socks")
("Blue_Look/Women/Pants/Sale")
("women > Clothing/Knits/> Jackets")
("Women > Clothing > Jackets")
("Women > Shoes & Accessories > Hats")
("Women > Clothing > Jackets")
("Dresswear/Women > Belts")
("clothing > men >shorts & swim trunks > swimtrunks")
("Men > Clothing > Shorts & Swimwear > Shorts")
("men>clothing>swimsuits")
("boys>accessories>socks")
("Layette/Infant Unisex - Accessories")
("Layette/Infant Unisex - Accessories - Sale")
("Basics > Baby Boy & Girl> Accessories > Hats,Scarves, & Gloves")
("Women/Clothing & Shoes/Swimwear")
("Evening/Men > Shoes & Accessories > Ties")
("Summer/Child/Jumpers")
("women/fall/clothing & accessories/jumpsuits")
("Women/Clothing & Swimwear/Skirts")
);

Current script:

select
category_description,
case when regexp_like(category_description, '(.[^o]|[^w]o)men|^men','i')
and not regexp_like(category_description, 'accessories|hats|bags|belts|shoes|socks|ties' ,'i') then 1 else 0 end as Mens_Apparel,

case when regexp_like(category_description, '(.[^o]|[^w]o)men|^men','i')
and regexp_like(category_description, 'accessories|hats|bags|belts|shoes|socks|ties' ,'i') then 1 else 0 end as Mens_Accessories,


case when regexp_like (category_description, 'women','i')
and not regexp_like(category_description, 'accessories|hats|bags|belts|shoes|socks|ties' ,'i') then 1 else 0 end as Womens_Apprarel,

case when regexp_like (category_description, 'women','i')
and regexp_like(category_description, 'accessories|hats|bags|belts|shoes|socks|ties' ,'i') then 1 else 0 end as Womens_Accessories ,


case when regexp_like (category_description, 'boys?|girls?|junior|child(ren)?|kids?','i')
and not regexp_like (category_description,'infant|babys?|hidden|womens?|mens?|gift cards?|test','i') then 1 else 0 end as Kids_Products ,


case when regexp_like(category_description,'toddler|infant|babys?|','i')
and not regexp_like (category_description,'hidden|womens|mens|^gift cards','i') then 1 else 0 end as Baby_Products,

case when regexp_like(category_description, '(.[^o]|[^w]o)men|^men','i')
and regexp_like(category_description,'swim(wear)?|swim(suit)?','i')then 1 else 0 end as Mens_Swimwear ,

case when regexp_like(category_description,'womens','i')
and regexp_like(category_description,'(swim(wear|suits))','i') then 1 else 0 end as Womens_Swimwear

from category_flags


  • Attachment: output.png
    (Size: 98.76KB, Downloaded 128 times)
Re: Combining multiple regexp_like statements & correct regex syntax [message #679090 is a reply to message #679089] Mon, 03 February 2020 10:37 Go to previous messageGo to next message
Michel Cadot
Messages: 67290
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Welcome to the forum.
Please read the OraFAQ Forum Guide and How to use [code] tags and make your code easier to read.

Also always post your Oracle version, with 4 decimals (query v$version), as often solution depends on it.

With any SQL or PL/SQL question, please, Post a valid/working Test case: create table (including all constraints) and insert statements along with the result you want with these data then we will be able work with your table and data. Explain with words and sentences the rules that lead to this result.

Re: Combining multiple regexp_like statements & correct regex syntax [message #679092 is a reply to message #679090] Tue, 04 February 2020 04:59 Go to previous messageGo to next message
_jum
Messages: 571
Registered: February 2008
Senior Member
See identical thread in Oracle Community
Re: Combining multiple regexp_like statements & correct regex syntax [message #679097 is a reply to message #679089] Tue, 04 February 2020 11:48 Go to previous messageGo to next message
flyboy
Messages: 1886
Registered: November 2006
Senior Member
Hi,

I do not know what do you expect as the whole assignment is not specifically related to Oracle.
Regular expression do not seem to be capable enough to treat these quite complex rules (can you precisely state them?) based on keyword positions among other terms, like conditional classification of the keyword "swimwear".
On the other hand, other cases (keywords "jumpsuit" and "infants") seem to be classified as per requirement in your output.
Anyway, adding another categories would be quite a pain.

Generally, this looks like a job for https://en.wikipedia.org/wiki/Lexical_analysis with some flavour of https://en.wikipedia.org/wiki/Natural_language_processing algorithms.
This approach may be quite an overkill for your task, it depends on overall requirement. Stating grammar covering all possible cases (and many others emerging later) may be difficult as well.

Just a note to your current regular expression masks: they would classify e.g. the value "garments" to "men".
This can be fixed by checking non-alpha character right before and after the word "men" in this mask.
There would be no need for checking the occurrance of "wo" in the string beginning to distinguish it from "women".
However, just mask change will not help in treating multiple occurrances of checked keywords in specific positions with different meanings.
Re: Combining multiple regexp_like statements & correct regex syntax [message #679102 is a reply to message #679092] Wed, 05 February 2020 01:25 Go to previous message
Michel Cadot
Messages: 67290
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

_jum wrote on Tue, 04 February 2020 11:59
See identical thread in Oracle Community
Same remarks to him and no more replies/feedback from him than here.

Previous Topic: Delete statement to retain last 90 days of data
Next Topic: Previous week's first and last day
Goto Forum:
  


Current Time: Fri Aug 07 23:18:46 CDT 2020